_index.org

Directed Exploration

Last edited: August 8, 2025

Softmax Method

Pull arm \(a\) with probability \(\propto \exp (\lambda \rho_{a})\), where \(\lambda \geq 0\) is the “precision parameter”.

When \(\lambda \to 0\), this system uses the same rate for each of the actions, so you are essentially randomly sampling; when \(\lambda \to \infty\), the system will use only the greedy action because only the element with the biggest \(\rho_{a}\) gets selected.

For a multi-state case:

\begin{equation} \propto \exp (\lambda Q(s,a)) \end{equation}

disambiguation: regular expression

Last edited: August 8, 2025

regular expressions are the same thing, but can be treated in two senses

discourse features

Last edited: August 8, 2025

discourse features are marks of fluency/etc. which mark one’s speech.

Discourse-Completion Task

Last edited: August 8, 2025

A Discourse-Completion Task is a tool used to elicit speech acts, such as showing an image, etc. For instance,

types of Discourse-Completion Tasks

discrete distribution

Last edited: August 8, 2025

A discrete set of chances: die, coin flip, etc.

We use probability mass function to model such a distribution:

\begin{equation} \sum_{i=1}^{n}P(X=i) = 1 \end{equation}

To each member of the distribution, we assign a factor. The parameters of this distribution are the probability values you assign to each group.