Policy Optimization
Last edited: August 8, 2025Policy Optimization deals with algorithms that, unlike value iteration/policy iteration/online planning which uses a surrogate (like value function or some future discounted reward) to calculate a policy, directly optimizes against policy parameters \(\theta\) for a policy \(\pi_{\theta}\).
Polio
Last edited: August 8, 2025polynomial
Last edited: August 8, 2025A polynomial is a polynomial
constituents
- a function \(p: \mathbb{F} \to \mathbb{F}\)
- coefficient \(a_0, \dots, a_{m} \in \mathbb{F}\)
requirements
A polynomial is defined by:
\begin{equation} p(z)=a_0+a_1z+a_2z^{2}+\dots +a_{m}z^{m} \end{equation}
for all \(z \in \mathbb{F}\)
additional information
degree of a polynomial \(\deg p\)
A polynomial’s degree is the value of the highest non-zero exponent. That is, for a polynomial:
\begin{equation} p(z) = a_0+a_1z+\dots +a_{m}z^{m} \end{equation}
with \(a_{m} \neq 0\), the degree of it is \(m\). We write \(\deg p = m\).
polynomial hierarchy
Last edited: August 8, 2025\begin{equation} \text{PH} = \bigcup_{c \in \mathbb{N}} \Sigma_{c} = \bigcup_{c \in \mathbb{N}} \pi_{c} \end{equation}
“a language is in the polynomial hierarchy if it can be described with a constant number of qualifiers”
results and conjectures
polynomial hierarchy conjecture
“the polynomial hierarchy is infinite”—that is, each arrow to a harder language is strict.
\(P \neq NP\), \(NP \neq \pi_{2}\)
PSPACE bounds the entire hierarchy
\(\text{PH} \subseteq \text{PSPACE}\)
because for every \(\exists\) you only have to keep one of it around.
polynomial interpolation
Last edited: August 8, 2025constituents
\(m\) data points \(\qty(x_i,y_{i})\)
requirements
we desire \(c_{j}\) such that:
\begin{equation} y = c_1 + c_{2} x + c_3 x^{2} + \dots \end{equation}
Given our set of basis functions \(\phi_{j}(x)\) for input \(x\), our goal is:
\begin{equation} y = c_1 \phi_{1} + c_2 \phi_{2} + \dots + c_{n}\phi_{n} \end{equation}
the \(\phi\) are the model function which determines our neural networks.
additional information
Monomial basis and vandermonde Matrix
to do this, we put stuff in matrix form following forms, called the matrix of monomial basis:
