batchalign

Last edited: August 8, 2025

Batchalign Benchmarking

Last edited: August 8, 2025

Batchalign Morphosyntax

Last edited: August 8, 2025

We now describe the procedure used to perform morpho-syntactic analysis which is used to extract morphological and dependency information, including the morphological features used in this analysis. The core facilities of neural morpho-syntax is provided by the Stanza package ((Qi et al. 2020)), on the basis of which we perform myriad customizations in order to support the analysis functionality needed for this work.

The process of morpho-syntax analysis occurs in five basic steps: 1) performance of raw language sample analysis (LSA) using automatic speech recognition (ASR) and utterance segmentation tools already built into the Batchalign system ((Liu et al. 2023)) 2) use of the Stanza tokenizer and multi-word token (MWT) recognizer to obtain initial word-level tokenization of each utterance 3) programmatic, language-specific correction of these tokenization, especially pertaining to multi-word tokens (MWTs) and multi-word forms 4) the invocation of the rest of the Stanza neural pipeline for morphology, dependency, and feature extraction 5) programmatic extraction and correction of output features after the Stanza pipeline.

Batchalign Paper Outline

Last edited: August 8, 2025

Things to include

Rev
How to handle interspersed results
Utterance segmentation
Why --prealigned and the overall performance of MFA
Beginning/End Bullet and why we throw away Rev’s output
fixbullets and manual utterance segmentation
&*INV= interspersed comments

Bayes Normalization Constant

Last edited: August 8, 2025

For some Baysian Network situation, you will note that there’s some bodge of values below:

\begin{equation} P(A|M) = \frac{P(M|A)P(A)}{P(M)} \end{equation}

if we are only interested in a function in terms of different values of \(a\), \(P(M)\) is not that interesting. Therefore, we can just calculate \(A\) for all \(a\), and then normalize it to sum to 1:

\begin{equation} P(A|M) \propto P(M|A)P(A) \end{equation}

and then, after calculating each \(P(M|A)P(A)\) , we just ensure that each thing sums to one.