The Accidental Computer

Blockchain systems require that each new block of data satisfies certain state transition functions – a set of computational rules that determine if the block’s transactions produce a valid new state. In early designs, full nodes would download every transaction and locally verify these rules, which becomes impractical as block sizes grow. Modern approaches improve scalability by using succinct proofs (e.g. SNARKs/STARKs) to convince light clients that the state transition rules hold without re-executing all transactions. However, two critical challenges arise: polynomial commitments and data availability.

Succinct proof systems often rely on polynomial commitments to attest that certain polynomials (encoding transaction data or witness data) evaluate to expected values, but opening these commitments incurs significant prover computation. This paper proposes a novel solution that unifies polynomial commitments with the data availability scheme, effectively getting polynomial commitment proofs “for free” as a side effect of ensuring data availability. This unification greatly reduces (or even eliminates) the extra work typically required to produce and verify polynomial commitment proofs in blockchain protocols.

Key Idea

The authors observe that many data availability schemes already perform an encoding of the block data (often via error-correcting codes or extension polynomials) and random sampling of that encoding to check data availability. By carefully structuring this encoding (in particular, treating the block as the evaluation of a multilinear polynomial on a boolean hypercube) and the sampling procedure, one can repurpose the same process as a polynomial commitment scheme for the block data. In short, the data availability layer itself acts as a commitment to a multilinear polynomial whose coefficients are the block’s contents, and it provides efficient methods to open (prove) polynomial evaluations with minimal overhead.

Specifically, if the block data is encoded as a multilinear polynomial’s evaluations, then a data availability check inherently involves checking random linear combinations of that polynomial’s values. This can serve as a polynomial commitment opening. In essence, the error-correcting code used for availability doubles as a polynomial commitment, so the work to encode the data and perform sampling can also prove polynomial evaluations without additional cryptographic overhead.

Polynomial Commitments from Data Availability

The paper introduces two concrete constructions whereby a data availability scheme is leveraged as a multilinear polynomial commitment scheme for the block data.

In the first variation, the entire block’s data is committed as one polynomial, and opening proofs incur zero extra prover work, since the standard data availability encoding and sampling suffice to act as a proof.

In the second variation , one can commit to (and open) polynomials defined over subsets of the data (e.g. specific rows or portions of the block) – this involves some additional prover work but still concretely small overhead, because most heavy computation is already done during the block encoding.

In both cases, the data availability protocol “simply serves both purposes,” i.e. it provides data availability guarantees and polynomial commitment proofs simultaneously.

Technical Explanation

Multilinear Extensions in Coding: A significant insight is modeling an $N$-element data set (or an $n\times n$ matrix of data) as the coefficients (or evaluations) of a multilinear polynomial. A multilinear polynomial in $k$ variables is one of total degree 1 in each variable (so degree at most $k$ overall). For example, if $n=2^k$, any length-$n$ vector $v=(v_0,\dots,v_{n-1})$ can be seen as a function $f(x_1,\dots,x_k)$ on $({0,1})^k$ (the $k$-bit binary inputs) by identifying the index in binary. There is a unique multilinear polynomial that interpolates those values. The paper leverages this fact in multiple dimensions: if the block is an $n\times n$ matrix $X$, with $n=2^k$, then $X$ defines a multilinear polynomial $f(u_1,\dots,u_{k}; v_1,\dots,v_k)$ in $2k$ variables such that $f(a; b) = X_{a,b}$ for each binary tuple $a\in({0,1})^{k}, b\in({0,1})^k$. The commitment to $X$ will essentially be an error-correcting codeword that allows evaluation of $f$ at arbitrary points. This connection between error-correcting codes and multilinear polynomial evaluation is a core theoretical contribution. The authors show that a ZODA encoding (applying a linear error-correcting code in the row direction and in the column direction of the matrix) naturally supports efficient multilinear evaluations. In particular, they use a tensor product of two linear codes $G, G' \in \mathbb{F}^{m\times n}$ to encode $X$ into a larger $m\times m$ matrix $Z = G X (G')^T$. They then show that by using special structured random vectors (of the form $\bar{g}_r = (1-r_1, r_1)\otimes\cdots\otimes(1-r_k, r_k) \in \mathbb{F}^n$ for a random challenge $r=(r_1,\dots,r_k)\in \mathbb{F}^k$, one can efficiently compute random multilinear combinations of the data. Specifically, defining $\bar{g}^r$ *and similarly $\bar{g}^{r'}\in \mathbb{F}^{n'}$*for a random $r'$, the quantity

$y_r := X \,\bar{g}r \in \mathbb{F}^{\,n'} \quad \text{and} \quad w{r'} := X^T \bar{g}'_{r'} \in \mathbb{F}^{\,n}$

can be obtained by the encoder (and will be sent as part of the proof). These are essentially partial evaluations of $f$ along one axis. Then the joint evaluation $f(r';,r)$ (treating $r'$ as inputs for the first $k'$ variables and $r$ for the other $k$) is given by the inner product $\bar{g}'_{r'}{}^TX\bar{g}_r.$ The sampler (verifier) will check that this claimed value is consistent with the committed codeword. In summary, the paper provides the algebraic and protocol framework to turn any data availability encoding that supports such structured queries into a polynomial commitment scheme.

Illustration of the encoded matrix (data $X$ encoded into $Z$) and the corresponding proofs used by the sampler to verify correctness. Here $Y$ represents the committed view of $Z$ by rows and $W$ by columns. The verifier samples a few rows of $Y$ and columns of $W$ (dashed boxes) to check the codeword consistency, and also receives the vectors $y_r$ and $w_{r'}$ which are structured linear combinations of the columns and rows of $X$, respectively. These are used to verify a random evaluation of the multilinear polynomial.

The commitment to polynomial $f$ is implicitly given by the commitments to $Z$ (the encoded data), and the opening proof for the evaluation $f(r',r)$ consists of the responses $y_r, w_{r'}$, and the revealed small pieces of $Y$ and $W$ in sets $S, S'$. The verifier’s checks ensure the opening is valid. Importantly, the prover (encoder) did not do any extra work beyond what data availability required: computing $Z$ (which they had to for encoding) and responding to a few linear queries $y_r, w_{r'}$ (which are trivial given $Z$ or $X$) is all that was needed. No separate cryptographic commitment or expensive proof was generated for the polynomial – it was “accidentally” accomplished by the data availability procedure. Hence, the first variation achieves a polynomial commitment over the entire block with essentially zero additional prover overhead.

State Transition Functions

How does this help with verifying state transition functions in a blockchain? A state transition function can be seen as an arbitrary computation or predicate $C(X) = \text{true/false}$ on the block data $X$ (for example, “all transactions are valid and result in correct state updates”). To avoid re-executing $X$, one can use a succinct proof that $C(X)$ holds. Many general-purpose succinct proofs (like the GKR protocol, or SNARKs) ultimately reduce the task of verifying $C(X)$ to one or more polynomial evaluations on $X$. For instance, the GKR protocol for verifying an $N$-step arithmetic circuit builds a multilinear extension of the inputs and reduces the statement to checking that a certain multilinear polynomial (related to the circuit’s computation trace) evaluated at a random point equals the claimed output $z$. Concretely, GKR reduces verifying $C(X)=z$ to evaluating an expression of the form:

$((1-r'1, r'1)\otimes \cdots \otimes (1-r'{k'}, r'{k'}))^T \; X \; ((1-r_1, r_1)\otimes \cdots \otimes (1-r_k, r_k))$