Comparison of DA Structures

Assume we have 512 mb block data and we want to prove the availability of this data using different structures. Let's consider the performance implications of different methods when dealing with a 512MB dataset divided into 256-bit (32-byte) chunks.

Assume that we use plonky2 for snarks and we use BLS12-381 elliptic curve for polynomial commitments. And for merkle tree, we use sha256 hash function. (We chose this hash function to fix the output size. We will also review the performance impact of choosing a snark-friendly hash function later)

Let's consider the scenario with 2,097,152 chunks. This is equivalent to $2^{21}$ chunks.

Direct Merkle Tree (Fraud Proof):
- Proof Size: Logarithmic in the number of chunks, so it would be approximately 21×hash size. Assuming a 256-bit hash function, the proof size might be around 21×256 bits/8=672 bytes.
- Prover Time: Proportional to the logarithm of the number of chunks, which means it would be relatively efficient compared to a linear approach. (but not a constant time)
- Verifier Time: Similar to prover time, proportional to the logarithm of the number of chunks.
- Commitment Size: 256-bit
RS Encoding + Kzg Commitments:
- Proof Size: Constant size, say a few hundred bytes. (global parameters should be considered whether to add here)
- Prover Time: Proportional to the size of the data, which is $2^{21}$× chunk size.
- Verifier Time: Efficient and constant time. (1 pairing check)
- Commitment Size: Constant size approximately ****384x2 bits
Merkle Tree + PLONK2 SNARKs:
- Proof Size: Constant size, could be in the order of a few kilobytes. (depends on which hash algorithm used)
- Prover Time: PLONK2 can be computationally intensive, especially for larger datasets. The time would depend on the size of the data, so it might be relatively high for a dataset of this scale.
- Verifier Time: Verifying PLONK2 proofs can be relatively efficient, especially considering the constant proof size.
- Commitment Size: 256-bit

The values mentioned above are the values such that the relevant data is considered as a single piece. It is thought that it is necessary to use a matrix structure for data availability sampling and add RS-encoding in all cases. Considered from this perspective, the following conclusions were reached:

We can apply the 3 designs mentioned above to the general structure we designed here. (the design has already been made on RS+KZG)
The advantage of using Merkle tree+Plonk2 is to get rid of the trusted setup and provide post-quantum security. But in this case, there will be some more complexity in the verifier time (logarithmic). For this reason, we can try RS+KZG first and when the quantum threat becomes a problem in the future, we can easily switch to this solution since our data structure is suitable for applying this method.

References

Issues with aggregate or recursive proofs: https://ethresear.ch/t/arithmetic-hash-based-alternatives-to-kzg-for-proto-danksharding-eip-4844/13863#issues-with-aggregate-or-recursive-proofs-10
zkTree: https://eprint.iacr.org/2023/208.pdf
Hyperproofs: https://eprint.iacr.org/2021/599.pdf
Recproofs: https://uploads-ssl.webflow.com/6460ebf2b6ff254688bebf1c/64e4dd54d9198fde8d58ef44_main.pdf
FRI as erasure code fraud proof: https://ethresear.ch/t/fri-as-erasure-code-fraud-proof/6610
A Guide to Selecting the Right Data Availability Layer: https://blog.availproject.org/a-guide-to-selecting-the-right-data-availability-layer/
Celestia's data availability layer: https://docs.celestia.org/learn/how-celestia-works/data-availability-layer
The current state of storage proofs: https://defi.sucks/insights/current-state-of-storage-proofs
Ideal vector commitment: https://ethresear.ch/t/open-problem-ideal-vector-commitment/7421