Jensen's Inequality
Last edited: January 1, 2026linear edition
if \(f\) is convex, then for \(x,y \in \text{dom }f, 0 \leq \theta \leq 1\), then:
\begin{equation} f\qty(\theta x + \qty(1-\theta) y) \leq \theta f\qty(x) + \qty(1-\theta) f\qty(y) \end{equation}
probabilistic extension
Let \(f\) be a convex function; that is, \(f’’\qty(x) \geq 0\); let \(x\) be a random variable. Then, \(f\qty(\mathbb{E}[x]) \leq \mathbb{E}\qty [f\qty(x)]\).
Further, if \(f\) is strictly convex, that is \(f’’\qty(x) > 0\), then \(\mathbb{E}\qty [f\qty(x)] = f\qty(\mathbb{E}[x])\), that is, \(x\) is constant.
model-free inte
Last edited: January 1, 2026model-free reinforcement learning
Last edited: January 1, 2026In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.
\begin{equation} \frac{1}{2} \qty(\frac{1}{2}) \end{equation}
review: estimating mean of a random variable
we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?
\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}
\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}
norm
Last edited: January 1, 2026The norm is the “length” of a vector, defined generally using the inner product as:
\begin{equation} \|v\| = \sqrt{\langle v,v \rangle} \end{equation}
additional information
properties of the norm
- nonnegativity: \(\norm{v} \geq 0\)
- zero: \(\|v\| = 0\) IFF \(v=0\)
- first-degree homogeneity: \(\|\lambda v\| = |\lambda|\|v\|\)
- triangle inequality: \(\norm{x+y} \leq \norm{x} + \norm{y}\)
inner product is a norm
Inner product is a norm:
- By definition of an inner product, \(\langle v,v \rangle = 0\) only when \(v=0\)
- See algebra:
\begin{align} \|\lambda v\|^{2} &= \langle \lambda v, \lambda v \rangle \\ &= \lambda \langle v, \lambda v \rangle \\ &= \lambda \bar{\lambda} \langle v,v \rangle \\ &= |\lambda |^{2} \|v\|^{2} \end{align}
Preference Elicitation
Last edited: January 1, 2026For weighted sum method for instance, we need to figure a \(w\) such that:
\begin{equation} f = w^{\top}\mqty[f_1 \\ \dots\\f_{N}] \end{equation}
where weight \(w \in \triangle_{N}\).
To do this, we essentially infer the weighting scheme by asking “do you like system \(a\) or system \(b\)”.
- first, we collect a series of design variables \((a_1, a_2, a_3 …)\) and \((b_1, b_2, b_3…)\) and we ask “which one do you like better”
- say our user WLOG chose \(b\) over \(a\)
- so we want to design a \(w\) such that \(w^{\top} a < w^{\top} b\)
- meaning, we solve for a \(w\) such that…
\begin{align} \min_{w}&\ \sum_{i=1}^{n} (a_{i}-b_{i})w^{\top} \\ \text{such that}&\ \bold{1}^{\top} w = 1 \\ &\ w \geq 0 \end{align}
