ZODA Overhead Calculations

This document compares the tensor and Hadamard variants of ZODA in the context of Nomos DA and estimates their dissemination and sampling overhead under a concrete parameter set.

Tensor ZODA (tensor construction)

In the tensor construction, the data matrix $, \tilde X$ is encoded in both dimensions with a tensor product code (e.g. Reed–Solomon on rows and columns). If $\tilde X$ is $n \times n$, the encoded matrix $Z$ is about $2n \times 2n$ for rate $1/2$, so there is a $4\times$ expansion in the number of symbols. The commitments bind all rows and all columns, and the ZODA sampling algorithm checks that randomly chosen rows and columns are mutually consistent, which implies that the whole matrix is close to a single valid codeword. This construction is a drop-in fit for “classical” 2D RS DA schemes (tensor codes) and supports distributed decoding across rows/columns, at the cost of higher encoded size and per-node bandwidth.

Hadamard ZODA (Hadamard construction)

In the Hadamard construction, we do not insist on a full tensor codeword. Instead, we encode $\tilde X$ in one direction (or via a smaller number of random linear projections), and we add a small number of random linear checks (Hadamard-style) that certify that the opened pieces come from a unique consistent encoding. Intuitively, we still get “this is an encoding of some unique data”, but with less encoded data per node and possibly smaller fields, at the cost of a bit of extra “proof-only” communication and losing the clean tensor structure. It is cheaper for dissemination (less redundancy) and per-node bandwidth, but decoding is less tensor-friendly.

Comparison

If we only need row-wise reconstruction and we sample “many rows + one column” under tensor ZODA, the column dimension is used purely as a consistency witness — not for reconstruction. In that regime, the tensor construction behaves like a “heavy-weight Hadamard”: the column is a full RS codeword (built from a 2D tensor code), whereas Hadamard ZODA would use only a handful of random linear projections in the column direction. The correctness guarantees are similar (“these rows come from a unique encoded matrix”), but tensor ZODA pays extra in dissemination and storage (about $4\times$ vs $\approx 2\times$) for a column structure that we never actually use to decode. The only real advantage of keeping the full tensor in this case is compatibility and future flexibility; from a pure DA + row-decoding perspective, Hadamard ZODA is the leaner construction.

Calculations

To make the overhead concrete,

Block size for DA data

$B_{\text{block}} = 1\ \text{GiB}$ of blob data per block.
Blob sizes

$B \in \{8,\ 16,\ 32\}\ \text{MiB}$
Number of blobs per block
- $B = 8\ \text{MiB} \Rightarrow 128$ blobs,
- $B = 16\ \text{MiB} \Rightarrow 64$ blobs,
- $B = 32\ \text{MiB} \Rightarrow 32$ blobs.
Matrix layout (for ZODA)
- Base matrix $\tilde X$ is $1024 \times 1024$, extended to $2048 \times 2048$.
- This matches 2048 subnetworks: each column index corresponds to one subnetwork.
Code rate in each dimension
- Reed–Solomon with rate $1/2$ in the coded dimension.
Consensus / DA nodes
- Total consensus validators:
  
  $V = 1000$.
- Fraction that are DA nodes:
  
  $V_{\text{DA}} = V/4 = 250$.
Sampling policy for this note
- Every consensus node runs a ZODA check per blob.
- Each consensus node samples 20 rows + 1 column per blob (randomly).

For “overhead” we measure:

$\text{inefficiency} = \frac{\text{total network bytes sent}}{\text{raw blob bytes}}$

Dissemination overhead (encoding & pushing chunks to DA)

Here we only consider the cost of encoding the blob data and sending encoded shares to DA nodes, ignoring any sampling traffic. We assume minimal replication: each encoded share is sent to exactly one DA node (no extra gossip factor).