Preliminary Research: Cryptarchia De-anonymisation of relative stake

Owners: @Alexander Mozeika @David Rusu

Details of derivations are in the overleaf documents Inference of stake in privacy-preserving PoS and Mathematical and statistical aspects of PoS.

Stake distribution strategies based on adversarial inference which uses naive estimator

The adversary observes leader election process of a node with the relative stake $\alpha$.
In $T$ time slots he/she is able to observe $n$ wins in $m$ observations.
For $m\geq1$ he/she uses the naive estimator $\hat{\alpha}=\frac{\log\left(1-n/m\right)}{\log(1-f)}$ of the true relative stake $\alpha$.
The probability that $\alpha(1-\gamma)\leq\hat{\alpha}\leq\alpha(1+\gamma)$ for large $T$ is given by $\mathrm{P}\left(\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]\right)=\frac{2 \,\mathrm{erf}\! \left(\frac{ \epsilon}{\sqrt{2\sigma^2(q)}}\right)}{\mathrm{erf}\! \left(\frac{s }{ \sqrt{2\sigma^2(q)}}\right)+\mathrm{erf}\! \left(\frac{ 1-s}{ \sqrt{2\sigma^2(q)}}\right)},$ where $s\equiv\phi(\alpha)$ is the lottery function, $\epsilon=\gamma\alpha\frac{\mathrm{d}}{\mathrm{d}\alpha}\phi(\alpha)$ and $\sigma^2(q)=s(1-s)/T q$. Here $q$ is the fraction of observed time-slots such that $Tq$ slots are observed on average.
An example of above probability is given below.

$The probability that inferred relative stake $\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]$, i.e. adversarial “confidence”, as a function of true relative stake $\alpha$ obtained in $T=432000$ , the number used in Cardano, time-slots when fraction $q=0.657$ of slots is observed. Here the probability that stake of a node with the true stake $\alpha=0.0126$ (the max. stake in the Bitcoin network), represented by red vertical line, is inferred with an “accuracy” within the fraction $\gamma=0.1$ of relative stake $\alpha$, represented by $\alpha(1\pm\gamma)$ red vertical dotted lines, is approx. $0.824$. The red dashed horizontal line corresponds to the threshold $\theta=0.5$. The blue vertical line at $\alpha=0.00252$ is the result of dividing the stake $\alpha=0.0126$ into $5$ nodes.$

The probability that inferred relative stake $\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]$, i.e. adversarial “confidence”, as a function of true relative stake $\alpha$ obtained in $T=432000$ , the number used in Cardano, time-slots when fraction $q=0.657$ of slots is observed. Here the probability that stake of a node with the true stake $\alpha=0.0126$ (the max. stake in the Bitcoin network), represented by red vertical line, is inferred with an “accuracy” within the fraction $\gamma=0.1$ of relative stake $\alpha$, represented by $\alpha(1\pm\gamma)$ red vertical dotted lines, is approx. $0.824$. The red dashed horizontal line corresponds to the threshold $\theta=0.5$. The blue vertical line at $\alpha=0.00252$ is the result of dividing the stake $\alpha=0.0126$ into $5$ nodes.

Assuming the fraction of observed time-slots $q$ and accuracy $\gamma$ a node can use its stake $\alpha$, to compute the probability $\delta(\alpha)=\mathrm{P}\left(\hat{\alpha}\in[\alpha(1-\gamma), \alpha(1+\gamma)]\right)$, i.e. the “confidence” obtained by an adversary in $T$ time-slots. For a node it is beneficial to reduce the latter which can be done by distributing its stake among a number of nodes. To this end the probability $\delta(\alpha)$ is compared with some threshold $\theta$ and if $\delta(\alpha)\geq\theta$ then the stake is divided, i.e. $\alpha\leftarrow\alpha/2$, $\alpha\leftarrow\alpha/3$, etc., until $\delta(\alpha)<\theta$.
The main functions of the algorithm which uses $\delta(\alpha)$ to distribute the stake are as follows

from math import erf, sqrt, log

def phi(alpha):
    global f
    return 1 - (1 - f)**alpha

def dphi(alpha):
    global f
    return -((1 - f)**alpha) * log(1 - f)

def Prob2(alpha, epsilon, T, q):
    global phi
    
    sqrt2 = sqrt(2.0)
    numerator = -2.0 * erf(sqrt2 * epsilon / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q))))
    denominator = (-erf(phi(alpha) * sqrt2 / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q)))) + erf(sqrt2 * (phi(alpha) - 1) / (2 * sqrt(phi(alpha) * (1 - phi(alpha)) / (T * q)))))
    return numerator / denominator

The above functions are then used to find minimum number of nodes such that distributing the stake into these nodes reduces the probability $\delta(\alpha)$ to $\theta$ as follows

import math

# Define parameters
T = 432000  # number of time-slots in one epoch
theta = 0.5  # adversarial confidence threshold
gamma0 = 0.1  # adversarial accuracy
a = 0.3  # fraction of compromised paths in the mixnet
r = 3  # redundancy in messages sent through the mixnet
q0 = 1 - (1 - a)**r  # fraction of compromised messages
f = 0.05
n_max = 10  # maximum number of iterations
alpha0 = 0.0126 #initial stake 

# Initialize relative stake alpha and delta
alpha = alpha0
epsilon = dphi(alpha) * alpha * gamma0
delta = Prob2(alpha, epsilon, T, q0)

# Loop until delta <= theta or n reaches n_max
n = 2
while delta > theta and n <= n_max:
    alpha = alpha0 / n
    epsilon = dphi(alpha) * alpha * gamma0
    delta = Prob2(alpha, epsilon, T, q0)
    n += 1

# Update alpha and Prob
alpha = alpha0 / (n - 1)
epsilon = dphi(alpha) * alpha * gamma0
delta = Prob2(alpha, epsilon, T, q0)

print("Final num. of nodes:", n-1)
print("Final alpha:", alpha)
print("Final Prob:", delta)

The above program suggests that the stake $\alpha=0.0126$ has to be divided among 5 nodes for the adversarial confidence $\delta(\alpha)<0.5$ when $0.3$ of paths in the mixnet are compromised (this is $80\times 3$, where $3$ is number of layers, mixnet sampled from $800$ nodes with $400$ adversarial nodes) and each message is sent $3$ times giving $0.657$ for the fraction of messages being compromised.

Analysis of naive estimator

The naive estimator of relative stake $\hat{\alpha}_i=\frac{\log\left(1-\hat{P}_i(1)\right)}{\log(1-f)}$ is obtained from the ML estimator by setting $\lambda=0$.