Introduction

This research examines the robustness of sampling mechanisms in decentralized networks, specifically addressing how node reliability affects data availability verification in the NomosDA protocol. In distributed systems, data must be reconstructible even when a significant portion of nodes are unreliable or malicious. Our mathematical models determine optimal redundancy parameters that maximize sampling success probability while minimizing resource overhead.

We analyze how increasing column redundancy—replicating data across multiple nodes in subnetworks—affects the overall reliability of the sampling process. Through probability calculations for networks with varying proportions of reliable nodes ($1/2$ and $2/3$), we demonstrate that a redundancy factor of $R=6$ provides optimal network confidence when assuming $1/2$ reliable nodes, while $R=4$ is sufficient with $2/3$ reliable nodes. These findings provide practical implementation guidelines for real-world distributed systems like NomosDA.

Overview

The NomosDA sampling mechanism selects $20$ data columns out of $2048$ columns, with each column assigned to a dedicated subnetwork. This approach has been proven sufficient to verify data availability and correct encoding with very high probability, as documented in Column Sampling.

A naive implementation of this mechanism assumes that contacting just $20$ nodes (each holding one of $2048$ columns) is enough to determine data availability. However, this approach will likely fail when some nodes are dishonest or unreliable—even when the data is actually available.

In a network where only $1/2$ of nodes are reliable, the probability of selecting only reliable nodes equals flipping a coin $20$ times and getting all tails. This probability is bounded by $2^{-20}$. Consequently, most of the network will almost certainly consider the sampling failed.

To increase the probability of successful sampling, we need to introduce redundancy. Rather than assigning just one node to each subnetwork (which holds a single column), we assign multiple nodes to the same subnetwork—effectively increasing column redundancy. We also assume that all nodes within a subnetwork are queried.

To ensure a robust network effect, the probability of successful sampling must exceed $50\%$ for most nodes. Additionally, the majority of nodes must share the same view of the sampling outcome. We therefore define network confidence as the probability that most of the network reaches the same conclusion. For the system to operate reliably under adversarial conditions, this confidence should approach 1, ensuring a consistent and coordinated network-wide perspective.

Analysis

Model

We consider a population of $N$ nodes distributed into $M$ subnetworks.
Nodes are sampled randomly (with replacement) from the population and assigned to one of the $M$ subnetworks.
We assume that every subnetwork contains $R$ nodes.
Within the population, a fraction $p$ of nodes are reliable, while the remaining fraction $1-p$ are unreliable.
A subnetwork is considered reliable if it contains at least one reliable node.
Each node in the population randomly samples (without replacement) $K$ subnetworks.
A sampling is successful if all $K$ sampled subnetworks are reliable.