Owner: 🟢 @Alexander Mozeika
Reviewer: 🟢 @Marvin Jones 🟢 @Álvaro Castro-Castilla
<aside> 💡
Details of mathematical derivations, with references to literature, and additional numerical results are in the Appendix.
</aside>
We consider anonymous communication system which uses a layered network of mix nodes, i.e. the mix network. We assume that nodes are used in the mix network are sampled from $N$ nodes of a network where $M$ nodes are adversarial. Presence of adversarial nodes in a communication path of the mix network can cause anonymity or communication failure. For anonymity failure to occur in a communication path all mix nodes on this path have to be adversarial and such path is compromised. In a mix network with $L$ layers naive reasoning (e.g. see the article) gives $(M/N)^L$ for the fraction of compromised paths. The latter is correct for a mix network of infinite width, but when the width is finite the fraction of compromised paths can be significantly larger than $(M/N)^L$. In this document we study statistical properties of the fraction compromised paths. Result of this work allows us to estimate the probability that the fraction of compromised paths is greater than some threshold. The latter can be used to optimise layered mix networks.
To construct the mixnet we sample $n$ nodes (without replacement) from the population of $N$ nodes.
At most $M$ , where $M<N$, nodes in the population are adversarial.
The number of adversarial nodes $m$ in the mixnet is random number from the hypergeometric probability distribution $P\left(m\vert N,n,M\right)=\frac{{n\choose m}{N-n\choose M-m}}{{N\choose M}}$
We have $L$ layers constructed from $n=\sum_{\mu=1}^L n_\mu$ nodes, where $n_\mu$ is the number of nodes in the layer $\mu$.
The probability that the number of adversarial nodes in layer 1 is $m_1$, in layer 2 is $m_2$, etc., is given by the multivariate hypergeometric distribution $P\left(m_{1},\ldots, m_{L}\vert n_{1},\ldots,n_{L}; m\right)=\frac{\delta_{m;\sum_{\mu=1}^L m_{\mu}}\prod_{\mu=1}^L {n_\mu\choose m_\mu}}{{n\choose m}},$
where $\\delta_{x;y}$ in above is the [Kronecker delta function](<https://mathworld.wolfram.com/KroneckerDelta.html>).
From above follows the joint probability distribution $P\left(m_{1},\ldots, m_{L}\vert n_{1},\ldots,n_{L}\right)=\sum_{m=0}^nP\left(m_{1},\ldots, m_{L}\vert n_{1},\ldots,n_{L}; m\right)P\left(m\vert N,n,M\right)$