basis
Last edited: August 8, 2025A basis is a list of vectors in \(V\) that spans \(V\) and is linearly independent
constituents
- a LIST! of vectors in vector space \(V\)
requirements
- the list is…
- linear independent
- spans \(V\)
additional information
criteria for basis
A list \(v_1, \dots v_{n}\) of vectors in \(V\) is a basis of \(V\) IFF every \(v \in V\) can be written uniquely as:
\begin{equation} v = a_1v_1+ \dots + a_{n}v_{n} \end{equation}
where \(a_1, \dots, a_{n} \in \mathbb{F}\).
forward direction
Suppose we have \(v_1, \dots, v_{n}\) as the basis in \(V\). We desire that \(v_1, \dots v_{n}\) uniquely constructs each \(v \in V\).
basis of domain
Last edited: August 8, 2025Suppose \(v_1, \dots v_{n} \in V\) is a basis of some vector space \(V\); \(w_1, \dots w_{n} \in W\) is just a good’ol list of length \(n= \dim V\) in \(W\).
There exists a unique linear map \(T \in \mathcal{L}(V,W)\) such that…
\begin{equation} Tv_{j} = w_{j} \end{equation}
for each \(j = 1, \dots n\)
Intuition
The layperson’s explanation of this result: 1) that, for everywhere you want to take the basis of one space, there’s always a unique linear map to take you there. 2) that, a linear map is determined uniquely by what it does to the basis of its domain.
batchalign
Last edited: August 8, 2025Batchalign Benchmarking
Last edited: August 8, 2025Batchalign Morphosyntax
Last edited: August 8, 2025We now describe the procedure used to perform morpho-syntactic analysis which is used to extract morphological and dependency information, including the morphological features used in this analysis. The core facilities of neural morpho-syntax is provided by the Stanza package ((Qi et al. 2020)), on the basis of which we perform myriad customizations in order to support the analysis functionality needed for this work.
The process of morpho-syntax analysis occurs in five basic steps: 1) performance of raw language sample analysis (LSA) using automatic speech recognition (ASR) and utterance segmentation tools already built into the Batchalign system ((Liu et al. 2023)) 2) use of the Stanza tokenizer and multi-word token (MWT) recognizer to obtain initial word-level tokenization of each utterance 3) programmatic, language-specific correction of these tokenization, especially pertaining to multi-word tokens (MWTs) and multi-word forms 4) the invocation of the rest of the Stanza neural pipeline for morphology, dependency, and feature extraction 5) programmatic extraction and correction of output features after the Stanza pipeline.