An neural network initialization scheme that tries to avoid Vanishing Gradients.
Consider \(Wx\) step in a neural network:
\begin{equation} o_{i} = \sum_{j=1}^{n_{\text{in}}} w_{ij} x_{j} \end{equation}
The variance of this:
\begin{equation} \text{Var}\qty [o_{i}] = n_{\text{in}} \sigma^{2} v^{2} \end{equation}
