Assume we have 512 mb block data and we want to prove the availability of this data using different structures. Let's consider the performance implications of different methods when dealing with a 512MB dataset divided into 256-bit (32-byte) chunks.

Assume that we use plonky2 for snarks and we use BLS12-381 elliptic curve for polynomial commitments. And for merkle tree, we use sha256 hash function. (We chose this hash function to fix the output size. We will also review the performance impact of choosing a snark-friendly hash function later)

Let's consider the scenario with 2,097,152 chunks. This is equivalent to $2^{21}$ chunks.

  1. Direct Merkle Tree (Fraud Proof):
  2. RS Encoding + Kzg Commitments:
  3. Merkle Tree + PLONK2 SNARKs:

The values mentioned above are the values such that the relevant data is considered as a single piece. It is thought that it is necessary to use a matrix structure for data availability sampling and add RS-encoding in all cases. Considered from this perspective, the following conclusions were reached:

References

  1. Issues with aggregate or recursive proofs: https://ethresear.ch/t/arithmetic-hash-based-alternatives-to-kzg-for-proto-danksharding-eip-4844/13863#issues-with-aggregate-or-recursive-proofs-10
  2. zkTree: https://eprint.iacr.org/2023/208.pdf
  3. Hyperproofs: https://eprint.iacr.org/2021/599.pdf
  4. Recproofs: https://uploads-ssl.webflow.com/6460ebf2b6ff254688bebf1c/64e4dd54d9198fde8d58ef44_main.pdf
  5. FRI as erasure code fraud proof: https://ethresear.ch/t/fri-as-erasure-code-fraud-proof/6610
  6. A Guide to Selecting the Right Data Availability Layer: https://blog.availproject.org/a-guide-to-selecting-the-right-data-availability-layer/
  7. Celestia's data availability layer: https://docs.celestia.org/learn/how-celestia-works/data-availability-layer
  8. The current state of storage proofs: https://defi.sucks/insights/current-state-of-storage-proofs
  9. Ideal vector commitment: https://ethresear.ch/t/open-problem-ideal-vector-commitment/7421