Directed Exploration
Last edited: August 8, 2025Softmax Method
Pull arm \(a\) with probability \(\propto \exp (\lambda \rho_{a})\), where \(\lambda \geq 0\) is the “precision parameter”.
When \(\lambda \to 0\), this system uses the same rate for each of the actions, so you are essentially randomly sampling; when \(\lambda \to \infty\), the system will use only the greedy action because only the element with the biggest \(\rho_{a}\) gets selected.
For a multi-state case:
\begin{equation} \propto \exp (\lambda Q(s,a)) \end{equation}
disambiguation: regular expression
Last edited: August 8, 2025regular expressions are the same thing, but can be treated in two senses
- the tool: regular expression (string processing)
- complexity theory: regular expression (complexity)
discourse features
Last edited: August 8, 2025discourse features are marks of fluency/etc. which mark one’s speech.
Discourse-Completion Task
Last edited: August 8, 2025A Discourse-Completion Task is a tool used to elicit speech acts, such as showing an image, etc. For instance,
types of Discourse-Completion Tasks
discrete distribution
Last edited: August 8, 2025A discrete set of chances: die, coin flip, etc.
We use probability mass function to model such a distribution:
\begin{equation} \sum_{i=1}^{n}P(X=i) = 1 \end{equation}
To each member of the distribution, we assign a factor. The parameters of this distribution are the probability values you assign to each group.
