👋 Howdy, I'm Houjun Liu!

I’m a first-year undergraduate student in the Computer Science Department at Stanford University, advised by Prof. Mykel Kochenderfer. I’m broadly interested in Natural Language Processing and Speech Language Sample Analysis; recently, I’ve been digging into 1) building better tools for language and speech processing make state-of-the-art classic NLP tasks accessible and useful 2) designing improved algorithmic approaches to large language model training + decoding to improve multi-hop decisions and conversation tracking and 3) exploring the applications of these models on various applied domains such as dementia and emergency medicine.

Welcome to my academic homepage! This is my little homestead on the internet about my academic interests. Check out my projects below. If you want to know more about the rest of my life, feel free to visit my website!

Recent goings on
Jul. 14, 24' ArXiv preprint released on LM red teaming. May. 18, 24' Journal Article (NACC) published@LWW AD Apr. 29, 24' ArXiv preprint released on POMDP LM decoding. Mar. 15, 24' TalkBank's Mandarin utterance segmentation model released. Feb. 26-27, 24' AAAI 2024. Vancouver was fun! Jun. 22, 23' Paper (Batchalign) Published by JSLHR

Shoot me an email at [firstname] at stanford dot edu, or, if you are around Stanford, grab dinner with me :)

About

I am a research engineer at the TalkBank Project at CMU under the supervision of Prof. Brian MacWhinney, where I develop better models and tools for clinical language sample analysis. I also work with the Stanford NLP Group, under direction of Prof. Chris Manning, on both mechanistically understanding language models and using them to efficiently solve semantic and syntax tagging tasks with Stanza. Finally, I am a research assistant with Prof. Xin Liu at UC Davis and at UC Davis Health, where I use transformer models to push our understanding of dementia.

In industry, I lead the development of Condution, simon, and am a managing partner at #!/Shabang. Previously, I worked as a consulting ML engineer at Dragonfruit AI under the AI Operations team.

Projects

SISL (2024)
ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts
Hardy, A., Liu, H., Lange, B., Kochenderfer M. J.
UC Davis Health (2024)
A Transformer Approach to Congnitive Impairment Classification and Prediction
Liu, H., Weakley, A.M., Zhang, J., Liu, X.
Preprint (2024)
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
Liu, H.
TalkBank (2023)
Automation of Language Sample Analysis
Liu, H., MacWhinney, B., Fromm, D., Lanzi, A.
TalkBank (2023)
DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses
Lanzi, A., Saylor, A.K., Fromm, D., Liu, H., MacWhinney, B., Cohen, M.L.
Nueva (2022)
ConDef: Automated Context-Aware Lexicography Using Large Online Encyclopedias
Liu, H., Sayyah, Z.
Preprint (2021)
Towards Automated Psychotherapy via Language Modeling
Liu, H.

Teaching

  • 2023- Teaching Assistant, Stanford Association for Computing Machinery (ACM) Chapter
  • 2022-2023 Head TA (co-instructor and summer lecturer) at AIFS AIBridge, a program funded by UC Davis Food Science
  • 2021-2023 Co-Developer, Research@Nueva, a high-school science education program

Course Notes

Some folks have mentioned that my course notes through classes at Stanford and before have been helpful. Feel free to browse my Stanford UG Courses Index if you want to check it out!

© 2019-2024 Houjun Liu. Licensed CC BY-NC-SA 4.0. This website layout is inspired by the lovely homepage of Karel D’Oosterlick.