whole metalanguage study
Last edited: August 8, 2025A study with the goal of identifying semantic primes.
Why is building a to-do list app so darn hard?
Last edited: August 8, 2025Why are Todo Lists (a.k.a. personal productivity systems) so hard to build well?
I’m genuinely curious. I was listening to the last episode of Cortex, and one of the hosts (CGP Grey) brought up a similar point regarding personal productivity platforms. OmniFocus, the reigning champion of the industry for professionals looking for a deeply customized system, has been staggering in their ability to ship the next version of their application. Much of the market consists of various different packagings of the same offering. Grey’s thesis of these platforms essentially boils down to this:
window-based co-occurance
Last edited: August 8, 2025window-based co-occurance is a matrix whereby we increment the value where each row is the center word, and each column is the number of occurrences of that other word next to a window of that word.
This approach is fine (not great), but if your vocabulary is HUGE, your word vectors will be exactly that length—bad. Therefore, we take this matrix and we SVD it; then, we chop off the smaller singular values to create a low dimensional approximation of our matrix.
Windows FAT
Last edited: August 8, 2025linked files architecture for filesystem, but it caches the file links in memory when the OS is running.
problems
- data is still scattered across the disk
- we had to construct the file allocation table
- though its must faster because jumping to the middle of the file is now in memory, we are still doing O(n) search for a specific sub part
Word Normalization
Last edited: August 8, 2025Pay attention to:
- cases (all letters to lower case?)
- lemmatization
This is often done with morphological parsing, for instance, you can try stemming.