Hi, I'm Ariuka, a recent MSc graduate with a conference paper at MRL workshop at EMNLP 2025 and a public feature at Matfyz, Charles University. My work combines structural probing, sparse autoencoders, and causal interventions to analyze syntax representations in multilingual transformer models. I bridge research and practical engineering: designing experiments, implementing reproducible pipelines, and producing interpretable analyses that inform model design and deployment. I am actively seeking an NLP/AI position to advance research-to-production work in deep learning and model interpretability.
October 2023 – (Expected) November 2025
Multimodal Language Understanding Computational Psycholinguistics Speech Science Mechanistic Interpretability AI Software Project
October 2023 – September 2025
Deep Learning Deep Reinforcement Learning Unsupervised ML in NLP Statistical Methods in NLP Complexity and Computability Data Structures Speech Recognition and Generation Morphological and Syntactic Analysis Language Data Resources NLP Applications General Linguistics
September 2018 – Spring 2022
Bachelor of Science, Software Engineering
Graduated summa cum laude and with Most Advanced Graduate Award. Average grade: 3.8/4.0 (top 1%)
Ariun-Erdene Tumurchuluun, Yusser Al Ghussin, David Mareček, Josef van Genabith, Koel Dutta Chowdhury
TL;DR: We introduce a multilingual tense-annotated corpus and apply mechanistic interpretability tools to LLama-3.1 8B to identify causally-active subspace for tense, monosemantic tense features, and validate through steering generation output.
Accepted at MRL workshop at EMNLP 2025, to appear (peer-reviewed)
Implemented diagnostic classifiers to measure multilingual BERT’s sensitivity to word-order variations in typologically diverse languages. Evaluated performance on custom multilingual corpora derived from Universal Dependencies annotations.
Built an HMM‐based speech recognizer using HTK: prepared and normalized a custom speech corpus, extracted MFCC features, trained monophone and triphone models with Baum–Welch reestimation, applied CMVN and VTLN. Evaluated WER on held-out test sets and packaged a live transcription demo using HVite and N‐gram decoding.
Developed a finite‐state analyzer with Foma: encoded lexicon entries (root, inflection class) to generate full paradigms for nouns and adjectives. Identified phonological alternations affecting feminine noun inflections and adjective agreement patterns.
Collected and preprocessed IMDB reviews (HTML stripping, tokenization). Annotated dataset; computed inter‐annotator agreement (63% observed, κ=0.172). Fine‐tuned BERT for sentiment classification; evaluated model accuracy on balanced test set.
Computed Jaccard similarity across Swadesh vocabularies for English, Spanish, French, Italian, Slovak, Romanian, Latin, Catalan, Macedonian, and Swahili. Visualized pairwise similarity via heatmap; observed low overlap (max 7%), with notable clusters for Romance languages.
Analyzed Glottolog data to explore language name length distribution and family sizes. Generated histograms showing most names around ten characters and identified language isolates (Ainu, Yokutsan, Nivkh, Chiquitano) within global family distribution.
Calculated conditional entropy of English and Czech text corpora under varying “mess‐up” probabilities (0%–10%). Demonstrated sensitivity to character‐level errors: baseline entropy 5.29 (English) and 4.75 (Czech). Visualized entropy trends as noise increased.
Evaluated NLTK’s Brill tagger and implemented supervised/uns upervised HMM taggers on English and Czech corpora. Achieved 86.9% accuracy for English and 60.7% for Czech with Brill. Applied Baum–Welch reestimation for HMM; performance within 5% of Brill’s accuracy, validating HMM viability for morphologically rich languages.
June 2021 – July 2023
Designed and maintained Django‐Vue microservices. Configured AWS EC2 and GCP clusters for scalable ML inference. Developed REST APIs for data pipelines. Automated CI/CD with GitLab CI. Coordinated requirements and sprint planning using Jira.
June 2020 – December 2020
Instructed high school students in HTML, CSS, JavaScript, and Java programming. Created lesson plans, assessed progress, and provided feedback to improve student comprehension.
March 2022 – July 2023
Developed full-stack features for secondary market microservices. Learned and applied PHP (Laravel, Lumen, Blade, Orchid) to implement underwriting and brokerage modules. Wrote unit tests; maintained CI/CD. Deployed via Docker and Kubernetes; integrated Redis for caching and Google Cloud Functions for serverless tasks.
PHP Laravel Lumen Blade Orchid Express JS Firebase Google Cloud Functions Redis
March 2022 – July 2023
Built front-end features in Vue.js for IPO ordering and credential management. Implemented OAuth3 SSO with Firebase Authentication and Google Cloud Functions. Transitioned from Keycloak to Firebase Auth to support a growing user base.
PHP Lumen Vue JS Express JS Firebase Google Cloud Functions Redis Centrifuge
October 2021 – July 2023
Maintained Django‐based crowdfunding platform. Triaged and resolved bugs, reviewed and merged pull requests. Implemented new features in Django REST framework. Coordinated with cross-functional teams to align development with business requirements.
Python Django Django REST Vue JS