model-free inte
Last edited: January 1, 2026model-free reinforcement learning
Last edited: January 1, 2026In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.
\begin{equation} \frac{1}{2} \qty(\frac{1}{2}) \end{equation}
review: estimating mean of a random variable
we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?
\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}
\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}
norm
Last edited: January 1, 2026The norm is the “length” of a vector, defined generally using the inner product as:
\begin{equation} \|v\| = \sqrt{\langle v,v \rangle} \end{equation}
additional information
properties of the norm
- nonnegativity: \(\norm{v} \geq 0\)
- zero: \(\|v\| = 0\) IFF \(v=0\)
- first-degree homogeneity: \(\|\lambda v\| = |\lambda|\|v\|\)
- triangle inequality: \(\norm{x+y} \leq \norm{x} + \norm{y}\)
inner product is a norm
Inner product is a norm:
- By definition of an inner product, \(\langle v,v \rangle = 0\) only when \(v=0\)
- See algebra:
\begin{align} \|\lambda v\|^{2} &= \langle \lambda v, \lambda v \rangle \\ &= \lambda \langle v, \lambda v \rangle \\ &= \lambda \bar{\lambda} \langle v,v \rangle \\ &= |\lambda |^{2} \|v\|^{2} \end{align}
Preference Elicitation
Last edited: January 1, 2026For weighted sum method for instance, we need to figure a \(w\) such that:
\begin{equation} f = w^{\top}\mqty[f_1 \\ \dots\\f_{N}] \end{equation}
where weight \(w \in \triangle_{N}\).
To do this, we essentially infer the weighting scheme by asking “do you like system \(a\) or system \(b\)”.
- first, we collect a series of design variables \((a_1, a_2, a_3 …)\) and \((b_1, b_2, b_3…)\) and we ask “which one do you like better”
- say our user WLOG chose \(b\) over \(a\)
- so we want to design a \(w\) such that \(w^{\top} a < w^{\top} b\)
- meaning, we solve for a \(w\) such that…
\begin{align} \min_{w}&\ \sum_{i=1}^{n} (a_{i}-b_{i})w^{\top} \\ \text{such that}&\ \bold{1}^{\top} w = 1 \\ &\ w \geq 0 \end{align}
principles of biomedical ethics
Last edited: January 1, 2026- autonomy
- informed consent
- beneficence
- non-maleficence
- justice
