motivation
Consider Generic Maximum Likelihood Estimate.
- parametric distribution estimation: suppose you have a family of densities \(p_{x}\qty(y)\), with parameter \(x\)
- we take \(p_{x}\qty(y) = 0\) for invalid values of \(x\)
maximum likelihood estimation: choose \(x\) to maximize \(p_{x}\qty(y)\) given some dataset \(y\).
linear measurement with IID noise
Suppose you have some kind of linear noise model:
\begin{equation} y_{i} = a_{i}^{T}x + v_{i} \end{equation}
where \(v_{i}\) is IID noise, and \(a^{T}_{i}\) is the model. We can write \(y\) probabilistically as:
\begin{equation} p_{x}\qty(y) = \prod_{i=1}^{m} p\qty(y_{i} - a_{i}^{T}x) \end{equation}
for some model \(p\) of noise \(v\). Thus the noise-aware parameter estimation is:
\begin{align} \min_{x}\quad & \sum_{i=1}^{m} \log p\qty(y_{i} - a_{i}^{T}x) \end{align}
with observed \(y\) and model \(a\).
some noise models
- Gaussian noise: ML estimate becomes least-squares
- Appalachian noise: ML estimate is l1-norm solution
logistic regression
Random variables \(y \in \qty {0,1}\) with distribution:
\begin{equation} p = \frac{\exp \qty(a^{T}u + b)}{1 + \exp \qty(a^{T}u + b)} \end{equation}
The maximization of this is also a concave problem.
