morpheme

Last edited: August 8, 2025

A morpheme is the smallest meaning-bearing unit of a language. “er”, or “ist”, etc. It contains:

stems: core meaning-bearing units, and
affexes: parts that adhere to stems

For non space-delineated languages, tokenization happens with morpheme (“词”).

Consider:

姚明进入总决赛

Is yao/ming first and last names seperated. Is zong combined with juesai? (i.e. ADJ vs. NOUN).

Commonly, Chinese performs word level tokenization if you don’t want to deal with it. Typically, this usuals neural sequence models.

morphism

Last edited: August 8, 2025

A morphism is a not-necessarily-invertible map between two objects of a category. If the map is indeed invertable, then we call the map an isomorphism.

morphological parsing

Last edited: August 8, 2025

recall morphemes are the smallest meaningful units of a word.

morphological parsing is the act of getting morphemes: cats => =cat s=o

stem +
affix

stemming

stemming just chops off the morpheme affixes; leaving the stems. “heights” => “heigh”. without lemmatization.

This increases recall (more stuff is caught we want to catch) at he cost of precision (what we catch is probably lots of false positives).

Languages with complex cojugation or morphology, this can’t work because you can’t just chop.

Multi-Agent RL

Last edited: August 8, 2025

Multi-LSTM for Clinical Report Generation

Last edited: August 8, 2025

Take X-Rays and generate clinical reports.

Method

encoder decoder architectures

Encoder

ConViT: convolutional vision transformer. Special thing: we swap out the attention

Double Weighted Multi-Head Attention

We want to force the model to focus on one thing, so we modulate the model based on the weights of other: if one head is big, we make the other head small.

where \(w_{\cos i} = \frac{\sum_{i}^{} \cos \qty (att_{a}, att_{base})}{N}\)

\begin{equation} w = w_{a} \cdot (1- w_{\cos i}) \end{equation}