xavier initialization

An neural network initialization scheme that tries to avoid Vanishing Gradients.

Consider \(Wx\) step in a neural network:

\begin{equation} o_{i} = \sum_{j=1}^{n_{\text{in}}} w_{ij} x_{j} \end{equation}

The variance of this:

\begin{equation} \text{Var}\qty [o_{i}] = n_{\text{in}} \sigma^{2} v^{2} \end{equation}