chapter 5.6 https://arxiv.org/pdf/1809.09044
Setup and Assumptions
Original Data Structure:
- The original data is represented as a $k×n$ matrix.
- It is extended to $k×2n$ using RS encoding with a 1/2 redundancy rate.
Distribution:
- Each column of the $k×2n$ matrix is stored on a separate subnet.
- There are $2n$ subnets, and each stores exactly one column.
Chunk Sampling Mechanism:
- A light node randomly selects a row-column coordinate $(i,j)$ and requests the corresponding chunk from the respective subnet.
- The light node needs to ensure that at least 50% of each row is available to reconstruct the original data.
Column Sampling Mechanism:
- A light node randomly selects a column coordinate $i$ and requests the corresponding column from the respective subnet.
- The light node needs to ensure that at least n columns are available to reconstruct the original data.
Main formula: $P(X \geq 1) = 1 - \prod_{i=0}^{s-1} \left( 1 - \frac{\text{number of unavailable chunks+1}}{\text{number of whole chunks} - i} \right)$
Chunk Sampling Analysis
Unrecoverability Threshold
For chunk sampling:
- A row is unrecoverable if $50\%=n$ chunks are unavailable in that row.
- The total number of chunks is $k\times2n$
If 50% of each row's chunks are unavailable:
- Total unavailable chunks = $k\times n+1$
Probability of Sampling an Unavailable Chunk
Using the formula:
$P_{\text{chunk}}(X \geq 1) = 1 - \prod_{i=0}^{s-1} \left( 1 - \frac{n+1}{k \cdot 2n - i} \right)$
Where:
- $s$: Number of samples.
- $k\times n+1$: Total unavailable chunks.