Owner: @Mehmet

The main purpose of this article is to compare the DA protocols found in the literature with the DA protocols designed by us, in order to select a protocol suitable for the needs of the Nomos project. The selected protocol is expected to provide verifiable encoding, proof of equivalence, scalability, and decentralized block production features. Additionally, the use of supernodes in the protocol is not desired, meaning that nodes are not expected to download and verify all original data. Furthermore, it is also aimed for the selected protocol to be compatible with DAS (Data Availability Sampling).

Considering the security and performance aspects in our previous studies, it was concluded that the RS+KZG design is the best option for us [doc]. Therefore, Avail, Semi-AVID and Ethereum Danksharding designs, which utilize this structure from the protocols found in the literature, have been chosen for comparison. We have also designed two different protocols. Detailed explanations of all protocols can be accessed through the following links:

Avail: https://github.com/availproject/data-availability/blob/master/reference document/Data Availability - Reference Document.pdf

Semi AVID: https://eprint.iacr.org/2021/1544.pdf

Danksharding: https://ethereum.org/roadmap/danksharding

NomosDA 1.5D : Data Availability Network Specification

NomosDA 2D:

The original data has been taken in a matrix structure format for all protocols. Each entry of the matrix is considered as a chunk. In this comparison, the elliptic curve BLS12-381 has been selected. The values of each chunk should be an element of the field defined by the elliptic curve. Therefore, each chunk has 384 bits, or 48 bytes, in size. Additionally, since each KZG commitment value calculated in the protocols is an elliptic curve point, its size is also specified as 48 bytes.

Firstly, we provide information related to the relevant protocols. In these details, we outline how data is expanded in the protocols, the types of nodes involved, and attempt to summarize the advantages and disadvantages of the respective designs. Subsequently, we present this information in tables. The tables have been prepared assuming the original data size to be 16MB and 64MB. Additionally, a separate Excel sheet has been prepared for different data sizes.

Avail DA Layer

avail.png

In the Avail design, the system can have two types of nodes:

  1. Classical full nodes having entire block,
  2. Column full nodes which keep only a single column of the data.

For consistency throughout the document, we will refer to classical full nodes as super nodes and column full nodes as storage nodes.

For super nodes, it takes the entire matrix $𝐷$, extends each column to $2𝑛$ points and getting an extended matrix $𝐷'$(original data + extended data). It then verifies for each row of $D'$ whether the commitment to $i^{th}$ row is $𝐶_𝑖$ for $1 ≤ 𝑖 ≤ 2𝑛$. The number of super nodes in the system is independent of the data size or the number of storage nodes in the system. Avail design does not provide a specific detail for this. A higher number of super nodes is considered necessary for decentralized structure.

For the storage nodes, they can fetch and keep only a column of the matrix $D$. They would extend each column to check whether they belong to the extended set of commitments. This is possible because of the homomorphic nature of the commitments and witnesses. As shown in the figure above, the original data is extended column-wise using RS-encoding. In this design, data is transmitted to each storage node with a different column. Therefore, when creating the data matrix, the number of storage nodes in the system should be taken into account. Additionally, each row commitment $C_i$ is sent to storage nodes. We can think of the data transmitted to storage nodes as two parts: column data and commitments.

The light clients querying a data block will sample some chunk $𝐷[𝑖] [ 𝑗 ]$. Along with the data, the light client gets a witness $𝑤[𝑖] [ 𝑗 ]$ and it can immediately verify the validity using the Kate Commitment scheme. If it queries multiple chunks of the same row, the batch commitment scheme helps have a single witness for all the sampled points.