Houjun Liu

knowledgebase testing page

\begin{equation} x_1^{(j)} = x_1^{(j-1)} + Attn\qty(x_{k}^{(j-1)}, \forall k) \end{equation}

\begin{equation} At_{x_{1}^{(j-1)}} = \text{softmax}\qty(\frac{q_{1} k_{j}, \forall j}{\sqrt{d_{\ \text{model}}}}) v_{j} \end{equation}

\begin{equation} At_{x_{1}^{(j-1)}} = \text{softmax}_{\text{top-k cliff}}\qty(\frac{q_{1} k_{j}, \forall j}{\sqrt{d_{\ \text{model}}}}) v_{j} \end{equation}

trigger ci