I scraped https://cexplorer.io/pool for the stake value of every pool (validator) in Cardano, here is a quick breakdown of the stats.

Cardano Pool Data:

pools.csv

Analysis

The histogram seems to shows it seems to follow a classic power law

Untitled

Fitting To Paretto

But attempts to fit a Paretto Distribution to this data failed, closest I was able to get is the following:

Untitled

Failure to fit the data to paretto has me thinking there are anomalies in the distribution that may be caused by external incentives and that may be skewing the distribution.

Anomalies in the Distribution

Removing the low stakers from the distribution reveals a few peaks and a sharp decline after 70MM ADA:

Untitled

These two peaks occur at 32.7MM ADA and 69.9MM ADA respectively.

Doing some research shows that Cardano has a concept of “Pool Saturation”, that is controlled by a global “Saturation Parameter ($k$)”. This parameter sets the target number of pools in the network. The target is enforced through a soft “stake cap”, i.e. a pool with 200 ADA when the stake cap is 100 ADA will earn the same rewards as a pool with 100 ADA.

Currently $k=500$, this sets the stake cap at 64MM ADA and IOHK blog posts suggest that there is a plan to move to $k=1000$ in the future which would correspond to a stake cap of 32.7MM ADA.

I suspect the peak at ~70MM ADA we see in the data is the result of pool operators who are slightly over their target of 64MM but don’t yet feel the incentive to split into smaller pools.

The other peak at 32MM ADA likely corresponds to pools who are anticipating the switch to $k=1000$ and hoping to avoid any lost revenue due to the stake cap.