Ariun-Erdene Tumurchuluun

Ariun-Erdene Tumurchuluun

Master of Science in Language & Communication Technologies

Hi, I'm Ariuka, a recent MSc graduate with a conference paper at MRL workshop at EMNLP 2025 and a public feature at Matfyz, Charles University. My work combines structural probing, sparse autoencoders, and causal interventions to analyze syntax representations in multilingual transformer models. I bridge research and practical engineering: designing experiments, implementing reproducible pipelines, and producing interpretable analyses that inform model design and deployment. I am actively seeking an NLP/AI position to advance research-to-production work in deep learning and model interpretability.

Education

Universität des Saarlandes, Saarbrücken, Germany

October 2023 – (Expected) November 2025

  • Master of Science, Language Science and Technology
  • Faculty of Mathematics and Computer Science
  • GPA: 2.0/5.0 (German scale — 1.0 = best, 5.0 = worst);
  • LCT Erasmus Mundus Joint Master's Scholarship (Fully funded)

Multimodal Language Understanding Computational Psycholinguistics Speech Science Mechanistic Interpretability AI Software Project

Charles University, Prague, Czech Republic

October 2023 – September 2025

  • Master of Science, Language Technologies and Computational Linguistics
  • Faculty of Mathematics and Physics
  • GPA: 1.91/4.0 (Czech scale — 1.0 = best, 4.0 = worst); Summa Cum Laude; Interview; Dean’s List
  • LCT Erasmus Mundus Joint Master's Scholarship (Fully funded)

Deep Learning Deep Reinforcement Learning Unsupervised ML in NLP Statistical Methods in NLP Complexity and Computability Data Structures Speech Recognition and Generation Morphological and Syntactic Analysis Language Data Resources NLP Applications General Linguistics

National University of Mongolia, Ulaanbaatar, Mongolia

September 2018 – Spring 2022

Bachelor of Science, Software Engineering

Graduated summa cum laude and with Most Advanced Graduate Award. Average grade: 3.8/4.0 (top 1%)

Publications

TenseLoC: Tense Localization and Control in a Multilingual LLM

Ariun-Erdene Tumurchuluun, Yusser Al Ghussin, David Mareček, Josef van Genabith, Koel Dutta Chowdhury

TL;DR: We introduce a multilingual tense-annotated corpus and apply mechanistic interpretability tools to LLama-3.1 8B to identify causally-active subspace for tense, monosemantic tense features, and validate through steering generation output.

Accepted at MRL workshop at EMNLP 2025, to appear (peer-reviewed)

NLP & ML Projects

Analyzing Word Order Encoding in Multilingual BERT

Implemented diagnostic classifiers to measure multilingual BERT’s sensitivity to word-order variations in typologically diverse languages. Evaluated performance on custom multilingual corpora derived from Universal Dependencies annotations.

Speech Recognition System with HTK

Built an HMM‐based speech recognizer using HTK: prepared and normalized a custom speech corpus, extracted MFCC features, trained monophone and triphone models with Baum–Welch reestimation, applied CMVN and VTLN. Evaluated WER on held-out test sets and packaged a live transcription demo using HVite and N‐gram decoding.

Morphological Analyzer for Russian

Developed a finite‐state analyzer with Foma: encoded lexicon entries (root, inflection class) to generate full paradigms for nouns and adjectives. Identified phonological alternations affecting feminine noun inflections and adjective agreement patterns.

Sentiment Analysis on Movie Reviews

Collected and preprocessed IMDB reviews (HTML stripping, tokenization). Annotated dataset; computed inter‐annotator agreement (63% observed, κ=0.172). Fine‐tuned BERT for sentiment classification; evaluated model accuracy on balanced test set.

Swadesh List Similarity Analysis

Computed Jaccard similarity across Swadesh vocabularies for English, Spanish, French, Italian, Slovak, Romanian, Latin, Catalan, Macedonian, and Swahili. Visualized pairwise similarity via heatmap; observed low overlap (max 7%), with notable clusters for Romance languages.

Typological Feature Analysis Using Glottolog

Analyzed Glottolog data to explore language name length distribution and family sizes. Generated histograms showing most names around ten characters and identified language isolates (Ainu, Yokutsan, Nivkh, Chiquitano) within global family distribution.

Entropy & Cross-Entropy in Language Modeling

Calculated conditional entropy of English and Czech text corpora under varying “mess‐up” probabilities (0%–10%). Demonstrated sensitivity to character‐level errors: baseline entropy 5.29 (English) and 4.75 (Czech). Visualized entropy trends as noise increased.

Brill’s Tagger & Unsupervised HMM Tagging

Evaluated NLTK’s Brill tagger and implemented supervised/un­s upervised HMM taggers on English and Czech corpora. Achieved 86.9% accuracy for English and 60.7% for Czech with Brill. Applied Baum–Welch reestimation for HMM; performance within 5% of Brill’s accuracy, validating HMM viability for morphologically rich languages.

Industry Experience

Software Engineer, AI Lab LLC

June 2021 – July 2023

Designed and maintained Django‐Vue microservices. Configured AWS EC2 and GCP clusters for scalable ML inference. Developed REST APIs for data pipelines. Automated CI/CD with GitLab CI. Coordinated requirements and sprint planning using Jira.

Teaching Assistant, Empasoft LLC

June 2020 – December 2020

Instructed high school students in HTML, CSS, JavaScript, and Java programming. Created lesson plans, assessed progress, and provided feedback to improve student comprehension.

Tech Experience

Icapital.mn

March 2022 – July 2023

Developed full-stack features for secondary market microservices. Learned and applied PHP (Laravel, Lumen, Blade, Orchid) to implement underwriting and brokerage modules. Wrote unit tests; maintained CI/CD. Deployed via Docker and Kubernetes; integrated Redis for caching and Google Cloud Functions for serverless tasks.

PHP Laravel Lumen Blade Orchid Express JS Firebase Google Cloud Functions Redis

Ipo.Icapital.mn

March 2022 – July 2023

Built front-end features in Vue.js for IPO ordering and credential management. Implemented OAuth3 SSO with Firebase Authentication and Google Cloud Functions. Transitioned from Keycloak to Firebase Auth to support a growing user base.

PHP Lumen Vue JS Express JS Firebase Google Cloud Functions Redis Centrifuge

FinBerry.mn

October 2021 – July 2023

Maintained Django‐based crowdfunding platform. Triaged and resolved bugs, reviewed and merged pull requests. Implemented new features in Django REST framework. Coordinated with cross-functional teams to align development with business requirements.

Python Django Django REST Vue JS