Owner: @Alexander Mozeika @Mehmet
Assumptions
- The data, organised in $K$ columns, is expanded to the $N=rK$, where $r=2,3,\ldots$ , columns.
- A number of columns in expanded data can be corrupted.
- Data is available if at least $K+1$ (of any) columns are uncorrupted.
- A node samples uniformly (without replacement) $S$ columns from $N$.
Statistical testing
Setting
- The population size is $N = rK$.
- The number of uncorrupted columns in the population is $N_U$.
- The number of corrupted columns in the population is $N_C=N - N_U$.
- A sample of size $S$ is drawn randomly (without replacement**)** from the population.
Observed counts
- $\hat{n}_U$ is number of uncorrupted columns in the sample.
- $\hat{n}_C = S - \hat{n}_U$ is the number of corrupted columns in the sample.
Distribution