ZODA Protocol Details

ZODA focuses on ensuring data availability through a three-step process:

Encoding: Encodes data as a matrix $X$ using a code $G$ (e.g., Reed–Solomon), and generates a secondary encoding $Y$ by scaling columns of $X$ with randomness and encoding rows using another code $G'$.

Sampling: Samplers verify correctness by sampling rows from $X$ and columns from $Y$, checking for consistency with the fully encoded matrix $Z$.

Decoding: After successful sampling, nodes can reconstruct the original data $X$ from sampled rows and columns using erasure decoding.

1. Encoding Algorithm

This step ensures that the data is encoded in a way that each row and column acts as proof of correct encoding.

Detailed Steps:

Input Data: Original data $X$ is represented as a matrix $X̃ \in F^{n \times n}$.
Encode Rows:
- Apply Reed–Solomon encoding $G$ to the matrix columns to get $X = GX̃$, where $X̃$ is the original data matrix.
- Commit to the rows of $X$ using a Merkle tree (each row is a leaf).
Random Sampling:
- A random number $r$ is generated (via Fiat–Shamir or interactively) and shared publicly.
- Construct a diagonal matrix $D_r = diag(g_r)$, where $g_r$ is a random vector.
Encode Columns:
- Multiply the matrix $X̃$ by $D_r$ and apply encoding $G'$ to get $Y = X̃ D_r G'^T$.
- Commit to the columns of $Y$.
Final Encoding:
- Perform a final encoding $Z = G X̃ D_r G'^T = G Y$ and commit to the entire matrix $Z$.

Key Insight: Rows encode columns, and columns encode rows, creating redundancy that allows verification with minimal overhead.

2. Sampling Algorithm

This step allows any node to verify that the encoding is correct by sampling a small number of rows and columns.

Detailed Steps:

Sampling Sets:
- Randomly choose a set $S$ of $|S|$ rows and a set $S'$ of $|S'|$ columns.
Request Samples:
- Request $X_S$ (the selected rows of $X$) and $Y_{S'}$ (the selected columns of Y).
Verification Checks:
- Check that each sampled row $X_S$ multiplied by $D_r$ matches the corresponding encoded column in $Y$:
  
  Check that $X_S D_r g'{j} = G Y{S'}$ for each $j \in S'$.
- Verify that each sampled column $Y_{S'}$ matches the expected encoding from $Z$.
  
  Verify that $Z_{ij} = g_i^T y_j$ for all $i \in S$ and $j \in S'$.