Zhaohui Wang
Full Stack Software Engineer / ML Engineer
Professional Summary
Computer Science M.S. candidate at USC with extensive experience in full-stack development and machine learning systems. Proven track record at Meetfood, NSFocus, and Tencent building high-performance backend systems, mobile applications, and AI systems using Python, Java, TypeScript, and Rust. Deep expertise in multi-agent reinforcement learning, knowledge distillation, recommender systems, and high-performance computing, with demonstrated ability to translate cutting-edge AI research into production-ready solutions.
Core Competencies
Technical Skills
Languages: Python, Java, TypeScript, JavaScript, Rust, C/C++, CUDA, SQL
ML & DL: PyTorch, Transformers, PEFT, TRL, LangChain, scikit-learn
ML Specializations: Knowledge Distillation, Model Compression, Neural Architecture Search, Reinforcement Learning
NLP & LLMs: LoRA/QLoRA, RAG, SHAP, Fisher Information, Prompt Engineering
Recommender Systems: DeepFM, AutoInt, DIN, xDeepFM, DCNv2, Transformer4Rec
Web & Backend: Spring Boot, Node.js/Express, FastAPI, Flask, MongoDB, PostgreSQL, Elasticsearch, Redis
Frontend & Mobile: React, React Native, Svelte 5, Tailwind CSS, WebAssembly (Rust/WASM)
HPC & Parallel: CUDA, OpenMP, Ray, AsyncIO, ROS
Cloud & DevOps: AWS (EC2, S3, CloudFront, Lambda), Docker, Kubernetes, CI/CD (GitHub Actions)
Vector DBs: FAISS, Chroma DB, IndexedDB
Core Strengths
Cross-functional Team Collaboration & Project Management
Full-Stack System Architecture Design
Fast Learner - Rapidly Apply New Technologies to Production
Problem Solving & Analytical Thinking
CI/CD Pipeline Optimization
Technical Documentation & Communication
Professional Experience
Full-Stack Software Engineer Intern
Meetfood | AWS, React Native, TypeScript, Node.js
May 2025 - Present | Los Angeles, CA
- Engineered full-stack mobile application for restaurant discovery platform, leading development from architecture design to deployment
- Built cross-platform mobile UI with React Native, TypeScript, and Jotai state management, implementing responsive design patterns and reusable component library to accelerate feature delivery by 20%
- Developed RESTful backend APIs using Node.js, Express, and MongoDB, handling media uploads, user authentication, and real-time data synchronization for thousands of concurrent users
- Architected scalable video processing pipeline with AWS MediaConvert and CloudFront CDN, optimizing client-side playback with adaptive bitrate streaming and reducing load latency by 20%
- Deployed cloud infrastructure on AWS (EC2, S3, RDS) with automated CI/CD via GitHub Actions and CodePipeline, reducing deployment time by 25% and ensuring zero-downtime releases
- Implemented real-time features including push notifications, live updates, and offline-first architecture with local caching and sync mechanisms
Full-Stack Software Engineer
NSFocus | Java, Python, React, Elasticsearch
Jun 2023 - Jun 2024 | Beijing, China
- Developed full-stack network security analytics platform with React frontend and Spring Boot backend, supporting real-time threat visualization, log analysis, and incident response workflows
- Built interactive dashboards with React, D3.js, and ECharts for security event visualization, implementing real-time updates via WebSocket and optimized rendering for 100K+ data points
- Engineered backend microservices with Spring Boot and Python FastAPI, integrating Elasticsearch for log aggregation and reducing query latency by 35% through query optimization
- Implemented authentication and authorization system with JWT tokens, role-based access control (RBAC), and session management for enterprise multi-tenant environment
- Containerized application stack with Docker and established CI/CD pipelines, automating testing, building, and deployment processes
Backend Engineer Intern
Tencent | Spring Boot, MyBatis, Redis, Docker
Jul 2019 - Aug 2019 | Shenzhen, China
- Developed backend services for enterprise messaging platform using Spring Boot and MyBatis, implementing RESTful APIs with Redis caching to support high-volume traffic
- Optimized asynchronous processing with CompletableFuture and reactive programming patterns, achieving 25% throughput improvement and reduced response time by 30%
- Deployed microservices to Kubernetes (TKE) with health checks, auto-scaling, and monitoring via Prometheus/Grafana, reducing deployment time by 40%
Key Projects & Achievements
LayerwiseAdapter: Multi-Teacher Fusion for Recommendation
PyTorch, CUDA | Jun 2024 - Sep 2024
- Designed and implemented a novel 3-layer adaptive framework for multi-teacher knowledge fusion in recommendation systems, integrating traditional ML algorithms with LLMs through Fisher-guided knowledge distillation
- Achieved SOTA performance on MovieLens 1M with RMSE=0.8921, surpassing best individual algorithm AutoInt (0.8910), while maintaining 43.8% parameter reduction
- Implemented 6 SOTA recommendation algorithms (DeepFM, AutoInt, xDeepFM, DIN, DCNv2, Transformer4Rec) with CUDA optimization, achieving 25x faster inference than LLM baseline
- Developed Fisher Information-guided importance analysis for intelligent layer selection and pruning-aware knowledge distillation (PAKD), validated with 75% size reduction and 400% speedup
FisherLD: Fisher-Guided Knowledge Distillation for LLM Compression
PyTorch, Transformers | Oct 2024 - Dec 2024
- Researched and implemented Fisher Information-guided layerwise distillation framework for efficient LLM-based recommendation systems
- Trained 12-layer Transformer baseline on Amazon Electronics reviews (86K samples) achieving 87.53% test accuracy, then compressed to 6-layer student model
- Designed layer importance analysis using Fisher Information Matrix and gradient norms, identifying top-6 critical layers with 1577x higher importance for 50% layer reduction
- Achieved compression breakthrough where student model outperformed teacher (54.9% vs 54.2%) with 43.8% parameter reduction and 44% model size reduction
Interruptr: Game-Theoretic Multi-Agent Code Analysis
Python, OpenAI API, Ollama | Oct 2024 - Nov 2024
- Architected a game-theory optimized heterogeneous multi-agent system with 3+1 architecture (3 GPT experts + 1 local verifier) for cost-quality Nash equilibrium in code analysis
- Achieved 22% cost reduction ($0.094 vs $0.120) and 13% quality improvement (F1: 0.85 vs 0.75) compared to single GPT-4, with 44.6% efficiency boost
- Designed adversarial verification system using lightweight Qwen3 local model (0.5B) as quality detector, detecting 20%+ LLM hallucinations with less than 5% cost overhead
- Implemented true parallel collaboration with async execution, achieving 3x speedup vs serial execution without quality degradation
Emotion-Aware Language Models with Audio Augmentation
PyTorch, Transformers, PEFT | Mar 2024 - Jul 2024
- Researched methods for training language models that perceive and respond to emotional context from speech by integrating audio features into LLMs
- Achieved 48% reduction in cross-speaker degradation (41.7x vs 80.2x) by training on real emotional speech (RAVDESS, 24 speakers) vs TTS-generated data
- Identified Feature-Generalization Paradox where better in-domain performance inversely correlates with cross-speaker robustness, establishing need for strict cross-speaker evaluation
- Fine-tuned Qwen2.5-1.5B-Instruct with 4-bit quantization and LoRA on emotional speech datasets
Persona-RAG: Memory-Enhanced Multi-Persona Conversational AI
PyTorch, LangChain, FAISS | Jan 2024 - May 2024
- Developed lightweight personality-driven conversational AI combining MBTI personality models with memory-enhanced retrieval (RAG)
- Implemented 3 distinct MBTI personalities using QLoRA fine-tuning on Qwen3-0.6B, achieving 97.3-97.6% token accuracy with only 39MB per personality adapter
- Designed persona-aware memory retrieval algorithm with weighted scoring, achieving better contextual relevance than standard RAG
- Built FAISS-based vector search system achieving <0.1s search time through 100+ memories with multi-user support
Music Style Filter PWA
Rust, WebAssembly, React, TypeScript | Sep 2024 - Dec 2024
- Built production-grade Progressive Web App for real-time audio style filtering with 8 preset styles using DSP algorithms compiled to WebAssembly
- Implemented audio processing engine in Rust with WASM compilation, achieving 5-10x performance improvement over JavaScript through SIMD optimizations
- Architected offline-first PWA with service workers and IndexedDB caching, achieving Lighthouse score 95+ across all metrics
Browser ML Inference Engine
Rust, WASM, React, TypeScript | Jan 2024 - Dec 2024
- Developed production-ready neural network inference engine running entirely in browser using Rust/WebAssembly, supporting 17+ operators
- Engineered high-performance WASM runtime with SIMD acceleration, Web Workers, and INT8 quantization, achieving 10-50x speedup over JavaScript
- Optimized binary size to under 1MB through aggressive optimization, tree shaking, and compression
Vector Database with Browser UI
Rust, WASM, React, IndexedDB | May 2024 - Nov 2024
- Created production-grade browser-based vector database using Rust/WASM with HNSW indexing, supporting 7 distance metrics and persistent storage via IndexedDB
- Implemented SIMD-accelerated vector operations achieving near-native performance in browser environment
- Published npm package with full documentation and example applications
Desktop Monitoring Widget
Rust, Tauri 2.x, Svelte 5 | Aug 2024 - Dec 2024
- Built cross-platform desktop application using Tauri 2.x with Svelte 5 frontend, supporting Windows, macOS, and Linux
- Developed Rust backend for real-time system metrics collection with minimal performance overhead
- Achieved sub-100ms launch time and under 50MB memory footprint through optimization
High-Performance Fractal Rendering Engine
C++, CUDA, OpenMP, WebAssembly | Sep 2024 - Dec 2024
- Designed massively parallel fractal rendering engine achieving 2 billion pixels/sec on dual RTX 3090
- Optimized CUDA kernels with memory coalescing, delivering 1,216x speedup over CPU baseline and 400x vs OpenMP for 4K resolution (6ms rendering)
- Developed WebAssembly version using Emscripten for 5-20x faster browser performance than JavaScript
AgentMesh: Distributed Multi-Agent Coordination Framework
Python, Ray, AsyncIO | Aug 2024 - Nov 2024
- Architected distributed async agent coordination framework using Actor model to solve race conditions for concurrent LLM agents
- Implemented AST-based semantic conflict detection and three-way merge algorithm, achieving 85% automatic merge success rate
- Designed priority-based task scheduling supporting 1000+ pending tasks with P99 latency < 500ms
SmartNet: Explainability-Guided Neural Architecture Search
PyTorch, Flask | May 2024 - Aug 2024
- Developed first platform combining visual network construction with explainability-guided neural architecture search (NAS)
- Implemented multi-objective NAS optimizing for accuracy + explainability (SHAP/Fisher scores) + efficiency + speed
- Built modular block system with 6+ reusable components and web-based drag-and-drop interface with real-time code generation
Multi-Robot Communication System
PyTorch, ROS, CUDA | Sep 2020 - Apr 2023
- Led design and development of multi-agent reinforcement learning system improving robot coordination through distributed control and real-time communication
- Implemented custom PyTorch-based MARL algorithms (DDPG, MADDPG, QMIX) with CUDA parallelization, achieving 4x faster convergence
- Integrated ROS middleware for real-time robot communication, supporting SLAM localization and distributed control across heterogeneous platforms
- Validated system across simulated (Gazebo) and physical robots (TurtleBot), improving task success rate by 30%
Education
M.S. in Computer Science
University of Southern California | Jul 2024 - Dec 2025
Los Angeles, CA
Ph.D. Candidate in Computer Science
University of Chinese Academy of Sciences | Aug 2020 - Mar 2023
Beijing, China
B.S. in Computer Science
Central South University | Sep 2016 - Jun 2020
Changsha, China
Additional Information
Work Authorization: F-1 Student Visa with OPT eligibility
Availability: Available for full-time positions starting January 2026
Languages: English (Fluent), Mandarin Chinese (Native)