Introduction

In data availability sampling, the goal is to ensure that light nodes—which are resource-constrained and cannot store or download the entire dataset—can reliably verify data availability by downloading only small, random portions of the dataset. This verification process prevents malicious actors from making data unavailable while evading detection.

This document examines the setup for NomosDA, a data availability protocol that uses erasure coding to protect against data withholding attacks. We evaluate two sampling strategies—chunk sampling and column sampling—and derive formulas for detecting unavailable shares with specified confidence levels. We then analyze how effectively each strategy performs under practical system parameters.

Overview

This document analyzes the NomosDA data availability protocol, focusing on two sampling strategies—chunk sampling and column sampling—to ensure reliable data verification for light nodes. The key findings are summarized below:

Efficiency Comparison:
- Column Sampling is significantly more efficient than Chunk Sampling for achieving the same confidence level.
- For a dataset represented by a $32 \times 1024$ matrix (1 MB of data):
  - Achieving 99.99% confidence requires 16 samples with Column Sampling, compared to 228 samples with Chunk Sampling.
Robustness Against Adversaries:
- Chunk Sampling: The adversary must make more than 50% of the chunks in a row unavailable to prevent recovery.
- Column Sampling: The adversary must withhold at least $n+1$ columns (e.g., 1025 columns for $2n=2048$) to render the data unrecoverable.
Practical Implications:
- Column Sampling is better suited for large matrix sizes due to its reduced sampling requirements and faster verification times.
- Light nodes can achieve high confidence in data availability with minimal resources using Column Sampling.

These results highlight the practical advantages of Column Sampling in terms of both efficiency and simplicity, making it the preferred strategy for NomosDA in most scenarios.

Data Encoding and Distribution

The original data is represented as a $k \times n$ matrix, where $k$ represents rows and $n$ represents columns.
Using RS encoding at a 1/2 redundancy rate, the matrix is extended to a $k \times 2n$ matrix.
The encoded matrix is distributed among 2n subnets, with each participating node holding one column of the matrix.

Sampling Strategies

Two types of data availability sampling strategies are analyzed:

Chunk Sampling: The light node samples small chunks (individual elements) from random row-column coordinates.
- Goal: Ensure that at least 50% of the chunks in each row are available to reconstruct the original data.
Column Sampling: The light node samples entire columns of the matrix.
- Goal: Ensure that at least $n$ columns are available for reconstruction.

Introduction

Overview

Data Encoding and Distribution

Sampling Strategies

Adversary Model