Owner: @Alexander Mozeika @Mehmet @Marcin Pawlowski

Assumptions

The data, organised in $K$ columns, is expanded to the $N=rK$, where $r=2,3,\ldots$ , columns.
A number of columns in the expanded data can be corrupted.
Here we consider each column as a segment in some sort of “flash memory” device. For now we assume only that the data can not be “read” from a damaged (or “corrupted”) segment.
The data is available recoverable if at least $K+1$ (of any) columns are uncorrupted.
A node samples uniformly (without replacement) $S$ columns from $N$.

The encoded data is stored in $rK$ columns (represented by rectangles) and can be retrieved from any $K+1$ uncorrupted columns (blue rectangles ), i.e. at most $K(r-1)-1$ can be corrupted (red rectangles).

The encoded data is stored in $rK$ columns (represented by rectangles) and can be retrieved from any $K+1$ uncorrupted columns (blue rectangles ), i.e. at most $K(r-1)-1$ can be corrupted (red rectangles).

Statistical testing

Setting

The population size is $N = rK$.
The number of uncorrupted columns in the population is $N_U$.
The number of corrupted columns in the population is $N_C=N - N_U$.
A sample of size $S$ is drawn randomly (without replacement**)** from the population.

Observed counts

$\hat{n}_U$ is number of uncorrupted columns in the sample.
$\hat{n}_C = S - \hat{n}_U$ is the number of corrupted columns in the sample.