Posts

unconc

Last edited: August 8, 2025

underfit

Last edited: August 8, 2025

Undirected Exploration

Last edited: August 8, 2025

base epsilon-greedy:

  1. choose a random action with probability \(\epsilon\)
  2. otherwise, we choose the action with the best expectation \(\arg\max_{a} Q(s,a)\)

epsilon-greedy exploration with decay

Sometimes, approaches are suggested to decay \(\epsilon\) whereby, at each timestamp:

\begin{equation} \epsilon \leftarrow \alpha \epsilon \end{equation}

whereby \(\alpha \in (0,1)\) is called the “decay factor.”

Explore-then-commit

Select actions uniformly at random for \(k\) steps; then, go to greedy and stay there

unimodal

Last edited: August 8, 2025

unique_lock

Last edited: August 8, 2025

the unique_lock is a mutex management type. Its a lock management system whereby the type will unlock the mutex on your behalf whenever the unique lock goes out of scope.

this is useful if there are multiple paths to exit a function, where an edge case made you forget when to unlock:

void my_scope(mutex &mut, condition_variable_any &cv) {
    unique_lock<mutex> lck(mut);
    // do stuff, you can even pass it to a condition variable!
    cv.wait(lck);
}