Owner: @Marcin Pawlowski @Alexander Mozeika
Reviewers: š¢@Mehmet š¢@Daniel Sanchez Quiros š¢@Ćlvaro Castro-Castilla
This research examines the robustness of sampling mechanisms in decentralized networks, specifically addressing how node reliability affects data availability verification in the NomosDA protocol. In distributed systems, data must be reconstructible even when a significant portion of nodes are unreliable or malicious. Our mathematical models determine optimal redundancy parameters that maximize sampling success probability while minimizing resource overhead.
We analyze how increasing column redundancyāreplicating data across multiple nodes in subnetworksāaffects the overall reliability of the sampling process. Through probability calculations for networks with varying proportions of reliable nodes ($1/2$ and $2/3$), we demonstrate that a redundancy factor of $R=6$ provides optimal network confidence when assuming $1/2$ reliable nodes, while $R=4$ is sufficient with $2/3$ reliable nodes. These findings provide practical implementation guidelines for real-world distributed systems like NomosDA.
The NomosDA sampling mechanism selects $20$ data columns out of $2048$ columns, with each column assigned to a dedicated subnetwork. This approach has been proven sufficient to verify data availability and correct encoding with very high probability, as documented in Column Sampling.
A naive implementation of this mechanism assumes that contacting just $20$ nodes (each holding one of $2048$ columns) is enough to determine data availability. However, this approach will likely fail when some nodes are dishonest or unreliableāeven when the data is actually available.
In a network where only $1/2$ of nodes are reliable, the probability of selecting only reliable nodes equals flipping a coin $20$ times and getting all tails. This probability is bounded by $2^{-20}$. Consequently, most of the network will almost certainly consider the sampling failed.
To increase the probability of successful sampling, we need to introduce redundancy. Rather than assigning just one node to each subnetwork (which holds a single column), we assign multiple nodes to the same subnetworkāeffectively increasing column redundancy. We also assume that all nodes within a subnetwork are queried.
To ensure a robust network effect, the probability of successful sampling must exceed $50\%$ for most nodes. Additionally, the majority of nodes must share the same view of the sampling outcome. We therefore define network confidence as the probability that most of the network reaches the same conclusion. For the system to operate reliably under adversarial conditions, this confidence should approach 1, ensuring a consistent and coordinated network-wide perspective.