A Generalized Linear Model is a model of data with the following properties:
- The model for \(P(y \mid x; \theta)\) should come from a exponential family (depending on what your distribution of \(y\) is—for real data, we pick Gaussian distribution, for binary data, we pick Bernoulli distribution, for counts, we use poisson distribution, \(\mathbb{R}^{+}\) we use gamma distribution or exponential distribution, and for distributions of distributions we use Beta Distribution or Dirichlet Distribution).
- \(\eta = \theta^{T}x\), where \(\theta,x \in \mathbb{R}^{d}\)
- at test time…
- we want to output \(\mathbb{E}\qty [y|x; \theta]\)
- so our predictor is written as \(h_{\theta}\qty(x) = \mathbb{E}\qty [y|x; \theta]\)
- at train time, we maximize log likelihood \(\max_{\theta} \sum_{i=1}^{n} \log P\qty(y^{(i)} | \theta^{T}x^{(i)})\)
- to update using gradient ascend, \(\theta_{j} = \theta_{j} + \alpha \sum_{i=1}^{n} \qty(y^{(i)} - h_{\theta}\qty(x^{(i)}))x_{j}^{(i)}\)
We also have two fancy names for things
components of exponential distribution
canonical response function
\begin{equation} \mu = \mathbb{E}[y|\eta] = g\qty(\eta) \end{equation}
canonical link function
\begin{equation} \eta = g^{-1}\qty(\mu) \end{equation}
expectation
\begin{equation} g\qty(\eta) = \pdv \eta a\qty(\eta) \end{equation}
example
- assume \(P(y|x; \theta) \sim \text{ExpFam}\qty(\eta)\) , where \(\eta = \theta^{T}x\) (since Bernoulli distribution is in the exponential family).
- Recall \(P\qty(y=1|x, \theta) = \phi = \frac{1}{1+e^{-\eta}} = \frac{1}{1+e^{-\theta^{\top}X}}\)