_index.org

Policy Optimization

Last edited: August 8, 2025

Policy Optimization deals with algorithms that, unlike value iteration/policy iteration/online planning which uses a surrogate (like value function or some future discounted reward) to calculate a policy, directly optimizes against policy parameters \(\theta\) for a policy \(\pi_{\theta}\).

Polio

Last edited: August 8, 2025

polynomial

Last edited: August 8, 2025

A polynomial is a polynomial

constituents

  • a function \(p: \mathbb{F} \to \mathbb{F}\)
  • coefficient \(a_0, \dots, a_{m} \in \mathbb{F}\)

requirements

A polynomial is defined by:

\begin{equation} p(z)=a_0+a_1z+a_2z^{2}+\dots +a_{m}z^{m} \end{equation}

for all \(z \in \mathbb{F}\)

additional information

degree of a polynomial \(\deg p\)

A polynomial’s degree is the value of the highest non-zero exponent. That is, for a polynomial:

\begin{equation} p(z) = a_0+a_1z+\dots +a_{m}z^{m} \end{equation}

with \(a_{m} \neq 0\), the degree of it is \(m\). We write \(\deg p = m\).

polynomial hierarchy

Last edited: August 8, 2025

\begin{equation} \text{PH} = \bigcup_{c \in \mathbb{N}} \Sigma_{c} = \bigcup_{c \in \mathbb{N}} \pi_{c} \end{equation}

“a language is in the polynomial hierarchy if it can be described with a constant number of qualifiers”

results and conjectures

polynomial hierarchy conjecture

“the polynomial hierarchy is infinite”—that is, each arrow to a harder language is strict.

\(P \neq NP\), \(NP \neq \pi_{2}\)

PSPACE bounds the entire hierarchy

\(\text{PH} \subseteq \text{PSPACE}\)

because for every \(\exists\) you only have to keep one of it around.

polynomial interpolation

Last edited: August 8, 2025

constituents

\(m\) data points \(\qty(x_i,y_{i})\)

requirements

we desire \(c_{j}\) such that:

\begin{equation} y = c_1 + c_{2} x + c_3 x^{2} + \dots \end{equation}

Given our set of basis functions \(\phi_{j}(x)\) for input \(x\), our goal is:

\begin{equation} y = c_1 \phi_{1} + c_2 \phi_{2} + \dots + c_{n}\phi_{n} \end{equation}

the \(\phi\) are the model function which determines our neural networks.

additional information

Monomial basis and vandermonde Matrix

to do this, we put stuff in matrix form following forms, called the matrix of monomial basis: