Owner: @Alexander Mozeika @Mehmet @Marcin Pawlowski

Introduction

Scalable blockchains increasingly rely on data availability sampling (DAS) — a technique that allows nodes to verify that a block's data can be fully reconstructed, without downloading the block in its entirety.

Instead of retrieving the full dataset, each node randomly samples a small number of pieces. If enough independent nodes perform these checks, the network gains statistical confidence that the data is intact and recoverable. This approach is central to modern scalable architectures such as Ethereum's Danksharding roadmap, Celestia, and Avail.

But sampling is not certainty. It is evidence.

This distinction matters. Data availability sampling is fundamentally a statistical decision process, and this work makes that explicit by modelling it as a hypothesis test.

Statistical Framing

We formalised sampling as a binary hypothesis test:

Null hypothesis (H₀): The data is unrecoverable.
Alternative hypothesis (H₁): The data is recoverable.

The network begins from a conservative position — assuming the data is unrecoverable — and rejects that assumption only when sampling provides sufficient evidence. This framing naturally produces two distinct failure modes.

🔴 Type I Error — False Rejection of H₀

The data is actually unrecoverable, but sampling concludes it is recoverable.

Effect:

An unrecoverable block may be accepted.
Nodes may later fail to reconstruct it.
Security guarantees may be violated.

This error threatens blockchain ****safety.

🔵 Type II Error — False Acceptance of H₀

The data is actually recoverable, but sampling concludes it is unrecoverable.