Softmax Regression

Suppose you have dataset \(\qty(x^{1}, y^{1}) …, \qty(x^{n}, y^{n})\), where \(y^{(j)} \in \qty {1,2,3,4}\).

We can learn a model of this

\begin{align} \max_{\theta} L\qty(\theta) &= \prod_{i=1}^{n} p\qty(y^{(i)} \mid x^{(i)}; \theta) \\ &= \prod_{i=1}^{n}\theta_{1}^{1\qty {y_{i} = 1}} \dots \theta_{4}^{1\qty {y_{i} = 4}} \end{align}

the derivative ends up being nice.

Derivation

Consider a multinomial distribution in 4 elements. Let’s write this in terms of a n exponential family. Consider:

\begin{equation} \begin{cases} T\qty(1) = \mqty(1 & 0 & 0) \\ T\qty(2) = \mqty(0&1&0) \\ T\qty(3) = \mqty(0&0&1) \end{cases} \end{equation}

And, given \(\phi_{1},\phi_{2},\phi_{3}\):

\begin{equation} p\qty(y) = \phi_{1}^{T\qty(y)_{1}} \phi_{2}^{T\qty(y)_{2}}\phi_{3}^{T\qty(y)_{3}}\phi_{4}^{1-\qty(T\qty(y)_{1}+T\qty(y)_{2}+T\qty(y)_{3})} \end{equation}

Taking the \(\exp \log \qty(^{})\) of the above, we obtain:

\begin{equation} p\qty(y) = \exp \qty(T\qty(y)_{1}\log \frac{\phi_{1}}{\phi_{4}} + T\qty(y)_{2} \log \frac{\phi_{2}}{\phi_{4}} + T\qty(y)_{3} \log \frac{\phi_{3}}{\phi_{4}} + \log\qty(\phi_{4})) \end{equation}

Which we can now rewrite in the standard form of an exponential family, for which \(b\qty(y) = 1\) and then:

\begin{equation} \eta = \mqty(\log \frac{\phi_{1}}{\phi_{4}} \\ \log \frac{\phi_{2}}{\phi_{4}} \\ \log \frac{\phi_{3}}{\phi_{4}}) \end{equation}

and

\begin{equation} a\qty(\eta) = -\log \qty(\phi_{4}) \end{equation}

\begin{equation} b\qty(y) = 1 \end{equation}

Solving for \(\phi_{j}\) in terms of \(\eta\), we obtain:

\begin{align} \phi_{i} &= \frac{e^{\eta_{i}}}{1 + e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} \\ &= \frac{e^{\theta_{i}^{T}x}}{1+\sum_{j=1}^{3} e^{\theta_{j}^{\top} x}} \end{align}

and we have:

\begin{equation} \phi_{4} = \frac{1}{1+\sum_{j=1}^{3} e^{\theta_{j}^{T}x}} \end{equation}

You may notice:

\begin{equation} \frac{e^{\eta_{j}}}{ 1+ e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} = \phi_{j} \end{equation}

for \(j \in [1,3]\), and \(\frac{1}{1 + e^{\eta_{1}}+ e^{\eta_{2}}+ e^{\eta_{3}}} = \phi_{4}\)