_index.org

model-free inte

Last edited: January 1, 2026

model-free reinforcement learning

Last edited: January 1, 2026

In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.

\begin{equation} \frac{1}{2} \qty(\frac{1}{2}) \end{equation}

review: estimating mean of a random variable

we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?

\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}

\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}

norm

Last edited: January 1, 2026

The norm is the “length” of a vector, defined generally using the inner product as:

\begin{equation} \|v\| = \sqrt{\langle v,v \rangle} \end{equation}

additional information

properties of the norm

  1. nonnegativity: \(\norm{v} \geq 0\)
  2. zero: \(\|v\| = 0\) IFF \(v=0\)
  3. first-degree homogeneity: \(\|\lambda v\| = |\lambda|\|v\|\)
  4. triangle inequality: \(\norm{x+y} \leq \norm{x} + \norm{y}\)

inner product is a norm

Inner product is a norm:

  1. By definition of an inner product, \(\langle v,v \rangle = 0\) only when \(v=0\)
  2. See algebra:

\begin{align} \|\lambda v\|^{2} &= \langle \lambda v, \lambda v \rangle \\ &= \lambda \langle v, \lambda v \rangle \\ &= \lambda \bar{\lambda} \langle v,v \rangle \\ &= |\lambda |^{2} \|v\|^{2} \end{align}

Preference Elicitation

Last edited: January 1, 2026

For weighted sum method for instance, we need to figure a \(w\) such that:

\begin{equation} f = w^{\top}\mqty[f_1 \\ \dots\\f_{N}] \end{equation}

where weight \(w \in \triangle_{N}\).

To do this, we essentially infer the weighting scheme by asking “do you like system \(a\) or system \(b\)”.

  1. first, we collect a series of design variables \((a_1, a_2, a_3 …)\) and \((b_1, b_2, b_3…)\) and we ask “which one do you like better”
  2. say our user WLOG chose \(b\) over \(a\)
  3. so we want to design a \(w\) such that \(w^{\top} a < w^{\top} b\)
  4. meaning, we solve for a \(w\) such that…

\begin{align} \min_{w}&\ \sum_{i=1}^{n} (a_{i}-b_{i})w^{\top} \\ \text{such that}&\ \bold{1}^{\top} w = 1 \\ &\ w \geq 0 \end{align}

principles of biomedical ethics

Last edited: January 1, 2026
  • autonomy
  • informed consent
  • beneficence
  • non-maleficence
  • justice