Owners: @Alexander Mozeika @David Rusu
Details of derivations are in the documents Statistical inference of relative stake and Analysis of leader election process in PoS.
Stake Distribution Strategies Based on Adversarial Inference Which Uses a Naive Estimator
- The adversary observes the leader election process of a node with the relative stake $\alpha$.
- In $T$ time slots, he/she is able to observe $n$ wins in $m$ observations.
- For $m\geq1$ he/she uses the naive estimator $\hat{\alpha}=\frac{\log\left(1-n/m\right)}{\log(1-f)}$ of the true relative stake $\alpha$.
- Here $f$, known to adversary, is the fraction of time-slots with at least one winner.
- For “accuracy” $\gamma\in(0,1)$, the probability that $\alpha(1-\gamma)\leq\hat{\alpha}\leq\alpha(1+\gamma)$ for large $T$ is given by $\mathrm{P}\left(\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]\right)=\frac{2 \,\mathrm{erf}\! \left(\frac{ \epsilon}{\sqrt{2\sigma^2(q)}}\right)}{\mathrm{erf}\! \left(\frac{s }{ \sqrt{2\sigma^2(q)}}\right)+\mathrm{erf}\! \left(\frac{ 1-s}{ \sqrt{2\sigma^2(q)}}\right)},$ where $s\equiv\phi(\alpha)$ is the lottery function, $\epsilon=\gamma\alpha\frac{\mathrm{d}}{\mathrm{d}\alpha}\phi(\alpha)$ and $\sigma^2(q)=s(1-s)/T q$. Here $q$ is the fraction of observed time-slots such that $Tq$ slots are observed on average.
- An example of above probability is given below.
![The probability that inferred relative stake $\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]$, i.e. adversarial “confidence”, as a function of the true relative stake $\alpha$ obtained in $T=432000$ time-slots (the number used in Cardano) when fraction $q=0.657$ of slots is observed. Here the probability that the stake of a node with the true stake $\alpha=0.0126$ (the max. stake in the Bitcoin network), represented by a red vertical line, is inferred with an “accuracy” within the fraction $\gamma=0.1$ of relative stake $\alpha$, represented by $\alpha(1\pm\gamma)$ red vertical dotted lines, is approx. $0.824$. The red dashed horizontal line corresponds to the threshold $\theta=0.5$. The blue vertical line at $\alpha=0.00252$ is the result of dividing the stake $\alpha=0.0126$ into $5$ nodes.](https://prod-files-secure.s3.us-west-2.amazonaws.com/1518abd9-c08f-4989-93c1-96525e62bce5/694d7ce0-ede2-44dc-ac13-5ad0bb78b47d/adver_conf.png)
The probability that inferred relative stake $\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]$, i.e. adversarial “confidence”, as a function of the true relative stake $\alpha$ obtained in $T=432000$ time-slots (the number used in Cardano) when fraction $q=0.657$ of slots is observed. Here the probability that the stake of a node with the true stake $\alpha=0.0126$ (the max. stake in the Bitcoin network), represented by a red vertical line, is inferred with an “accuracy” within the fraction $\gamma=0.1$ of relative stake $\alpha$, represented by $\alpha(1\pm\gamma)$ red vertical dotted lines, is approx. $0.824$. The red dashed horizontal line corresponds to the threshold $\theta=0.5$. The blue vertical line at $\alpha=0.00252$ is the result of dividing the stake $\alpha=0.0126$ into $5$ nodes.
- Having estimated the fraction of observed time-slots $q$ and accuracy $\gamma$ a node can use its stake $\alpha$, to compute the probability $\delta(\alpha)=\mathrm{P}\left(\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]\right)$, i.e. the “confidence” obtained by an adversary in $T$ time-slots. For a node, it is beneficial to reduce the latter, which can be done by distributing its stake among a number of nodes. To this end, the probability $\delta(\alpha)$ is compared with some threshold $\theta$ and if $\delta(\alpha)\geq\theta$ then the stake is divided, i.e. $\alpha\leftarrow\alpha/2$, $\alpha\leftarrow\alpha/3$, etc., until $\delta(\alpha)<\theta$.
- The main functions of the algorithm which uses $\delta(\alpha)$ to distribute the stake are as follows:
from math import erf, sqrt, log
def phi(alpha):
global f
return 1 - (1 - f)**alpha
def dphi(alpha):
global f
return -((1 - f)**alpha) * log(1 - f)
def Prob2(alpha, epsilon, T, q):
sqrt2 = sqrt(2.0)
numerator = -2.0 * erf(sqrt2 * epsilon / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q))))
denominator = (-erf(phi(alpha) * sqrt2 / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q)))) + erf(sqrt2 * (phi(alpha) - 1) / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q)))))
return numerator / denominator
- The above functions are then used to find minimum number of nodes such that distributing the stake into these nodes reduces the probability $\delta(\alpha)$ to $\theta$ as follows
import math
# Define parameters
T = 432000 # number of time-slots in one epoch
theta = 0.5 # adversarial confidence threshold
gamma0 = 0.1 # adversarial accuracy
a = 0.3 # fraction of compromised paths in the mixnet
r = 3 # redundancy in messages sent through the mixnet
q0 = 1 - (1 - a)**r # fraction of compromised messages
f = 0.05
n_max = 10 # maximum number of iterations
alpha0 = 0.0126 #initial stake
# Initialize relative stake alpha and delta
alpha = alpha0
epsilon = dphi(alpha) * alpha * gamma0
delta = Prob2(alpha, epsilon, T, q0)
# Loop until delta <= theta or n reaches n_max
n = 2
while delta > theta and n <= n_max:
alpha = alpha0 / n
epsilon = dphi(alpha) * alpha * gamma0
delta = Prob2(alpha, epsilon, T, q0)
n += 1
# Update alpha and Prob
alpha = alpha0 / (n - 1)
epsilon = dphi(alpha) * alpha * gamma0
delta = Prob2(alpha, epsilon, T, q0)
print("Final num. of nodes:", n-1)
print("Final alpha:", alpha)
print("Final Prob:", delta)
- The above program suggests that the stake $\alpha=0.0126$ has to be divided among 5 nodes for the adversarial confidence $\delta(\alpha)<0.5$ when $0.3$ of paths in the mixnet are compromised (this is $80\times 3$, where $3$ is the number of layers, with a mixnet sampled from $800$ nodes with $400$ adversarial nodes) and each message is sent $3$ times giving $0.657$ for the fraction of messages being compromised.
Analysis of Naive Estimator
- The naive estimator of relative stake $\hat{\alpha}_i=\frac{\log\left(1-\hat{P}_i(1)\right)}{\log(1-f)}$ is obtained from the maximum likelihood (ML) estimator by setting $\lambda=0$.
- The probability that $\alpha_i-\epsilon\leq\hat{\alpha}i\leq\alpha_i+\epsilon$, where $\alpha_i$ is true relative stake and $\epsilon$ is “accuracy”, is given by $P\left(\phi_f(\alpha_i-\epsilon)\,m\leq n \leq \phi_f(\alpha_i+\epsilon)\,m\vert m>0\right)=\sum{m=1}^T\sum_{n=0}^m\frac{P\left(m\vert T \right)P\left(n\vert m \right)}{1-(1-q)^T} 1\left[\phi_f(\alpha_i-\epsilon)\,m\leq n \leq \phi_f(\alpha_i+\epsilon)\,m\right]$, where $P\left(m\vert T \right)$ is the binomial distribution of number of observations $m$, with the parameter $q$ such that $q\,T$ is the average number of observations, and $P\left(n\vert m \right)$ is the binomial distribution of the number of observed wins n, with the parameter $\phi_f(\alpha_i)$ such that $\phi_f(\alpha_i)\, m$ is the average number of observed wins.
- The probability of $\alpha_i-\epsilon\leq\hat{\alpha}_i\leq\alpha_i+\epsilon$ can be interpreted as “confidence”. The probability that $\alpha_i-\epsilon\leq\hat{\alpha}_i$ , given by $P\left(\phi_f(\alpha_i-\epsilon)\,m\leq n\vert m>0\right)$, is also of interest. However, for $\hat{P}_i(1)>f$, which can happen for short observation times $T$, the estimator $\hat{\alpha}_i>1$ (and hence the probability of $\alpha_i-\epsilon\leq\hat{\alpha}_i$ ) can be considered only for long observation times $T$ where the probability of the event $\hat{\alpha}_i>1$ is small.