ConDef Abstract
Last edited: August 8, 2025Current automated lexicography (term definition) techniques cannot include contextual or new term information as a part of its synthesis. We propose a novel data harvesting scheme leveraging lead paragraphs in Wikipedia to train automated context-aware lexicographical models. Furthermore, we present ConDef, a fine-tuned BART trained on the harvested data that defines vocabulary terms from a short context. ConDef is determined to be highly accurate in context-dependent lexicography as validated on ROUGE-1 and ROUGE-L measures in an 1000-item withheld test set, achieving scores of 46.40% and 43.26% respectively. Furthermore, we demonstrate that ConDef’s synthesis serve as good proxies for term definitions by achieving ROUGE-1 measure of 27.79% directly against gold-standard WordNet definitions.Accepted to the 2022 SAI Computing Conference, to be published on Springer Nature’s Lecture Notes on Networks and Systems Current automated lexicography (term definition) techniques cannot include contextual or new term information as a part of its synthesis. We propose a novel data harvesting scheme leveraging lead paragraphs in Wikipedia to train automated context-aware lexicographical models. Furthermore, we present ConDef, a fine-tuned BART trained on the harvested data that defines vocabulary terms from a short context. ConDef is determined to be highly accurate in context-dependent lexicography as validated on ROUGE-1 and ROUGE-L measures in an 1000-item withheld test set, achieving scores of 46.40% and 43.26% respectively. Furthermore, we demonstrate that ConDef’s synthesis serve as good proxies for term definitions by achieving ROUGE-1 measure of 27.79% directly against gold-standard WordNet definitions.
conditional Gaussian model
Last edited: August 8, 2025Say you have one continuous variable \(X\), and one discrete variable \(Y\), and you desire to express the probability of \(X\) conditioned upon \(Y\) using a gaussian model:
\begin{equation} p(x|y) = \begin{cases} \mathcal{N}(x \mid \mu_{1}, \sigma_{1}^{2}), y^{1} \\ \dots \\ \mathcal{N}(x \mid \mu_{1}, \sigma_{1}^{2}), y^{n} \\ \end{cases} \end{equation}
conditional plan
Last edited: August 8, 2025conditional plan is a POMDP representation technique. We can represent a conditional plan as a tree.
toy problem
crying baby POMDP problem:
- actions: feed, ignore
- reward: if hungry, negative reward
- state: two states: is the baby hungry or not
- observation: noisy crying (she maybe crying because she’s genuinely hungry or crying just for kicks)
formulate a conditional plan
we can create a conditional plan by generating a exponential tree based on the observations. This is a policy which tells you what you should do given the sequence of observations you get, with no knowledge of the underlying state.
conditions in the Great Depression
Last edited: August 8, 2025There are many condition in the Great Depression caused
- by 1932, 1/4 had no work
- emigration exceeded immigration
- decrease in American birth
- increase of mental illness and suicide
- people create Hooverviles
- movies and radio became much more popular
confidence interval
Last edited: August 8, 2025proportional confidence intervals
We will measure a single stastistic from a large population, and call it the point estimate. This is usually denoted as \(\hat{p}\).
Given a proportion \(\hat{p}\) (“95% of sample), the range which would possibly contain it as part of its \(2\sigma\) range is the \(95\%\) confidence interval.
Therefore, given a \(\hat{p}\) the plausible interval for its confidence is:
\begin{equation} \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \end{equation}
where, \(n\) is the sample size, \(\hat{p}\) is the point estimate, and \(z*=1.96\) is the critical value, the z-score denoting \(95\%\) confidence (or any other desired confidence level).