Zhaohui Wang

Full Stack Software Engineer / ML Engineer

+1 (213) 910-9843
Los Angeles, CA

Professional Summary

Computer Science M.S. candidate at USC with extensive experience in full-stack development and machine learning systems. Proven track record at Meetfood, NSFocus, and Tencent building high-performance backend systems, mobile applications, and AI systems using Python, Java, TypeScript, and Rust. Deep expertise in multi-agent reinforcement learning, knowledge distillation, recommender systems, and high-performance computing, with demonstrated ability to translate cutting-edge AI research into production-ready solutions.

Core Competencies

Technical Skills

Languages: Python, Java, TypeScript, JavaScript, Rust, C/C++, CUDA, SQL
ML & DL: PyTorch, Transformers, PEFT, TRL, LangChain, scikit-learn
ML Specializations: Knowledge Distillation, Model Compression, Neural Architecture Search, Reinforcement Learning
NLP & LLMs: LoRA/QLoRA, RAG, SHAP, Fisher Information, Prompt Engineering
Recommender Systems: DeepFM, AutoInt, DIN, xDeepFM, DCNv2, Transformer4Rec
Web & Backend: Spring Boot, Node.js/Express, FastAPI, Flask, MongoDB, PostgreSQL, Elasticsearch, Redis
Frontend & Mobile: React, React Native, Svelte 5, Tailwind CSS, WebAssembly (Rust/WASM)
HPC & Parallel: CUDA, OpenMP, Ray, AsyncIO, ROS
Cloud & DevOps: AWS (EC2, S3, CloudFront, Lambda), Docker, Kubernetes, CI/CD (GitHub Actions)
Vector DBs: FAISS, Chroma DB, IndexedDB

Core Strengths

Cross-functional Team Collaboration & Project Management
Full-Stack System Architecture Design
Fast Learner - Rapidly Apply New Technologies to Production
Problem Solving & Analytical Thinking
CI/CD Pipeline Optimization
Technical Documentation & Communication

Professional Experience

Full-Stack Software Engineer Intern

Meetfood | AWS, React Native, TypeScript, Node.js

May 2025 - Present | Los Angeles, CA

  • Engineered full-stack mobile application for restaurant discovery platform, leading development from architecture design to deployment
  • Built cross-platform mobile UI with React Native, TypeScript, and Jotai state management, implementing responsive design patterns and reusable component library to accelerate feature delivery by 20%
  • Developed RESTful backend APIs using Node.js, Express, and MongoDB, handling media uploads, user authentication, and real-time data synchronization for thousands of concurrent users
  • Architected scalable video processing pipeline with AWS MediaConvert and CloudFront CDN, optimizing client-side playback with adaptive bitrate streaming and reducing load latency by 20%
  • Deployed cloud infrastructure on AWS (EC2, S3, RDS) with automated CI/CD via GitHub Actions and CodePipeline, reducing deployment time by 25% and ensuring zero-downtime releases
  • Implemented real-time features including push notifications, live updates, and offline-first architecture with local caching and sync mechanisms

Full-Stack Software Engineer

NSFocus | Java, Python, React, Elasticsearch

Jun 2023 - Jun 2024 | Beijing, China

  • Developed full-stack network security analytics platform with React frontend and Spring Boot backend, supporting real-time threat visualization, log analysis, and incident response workflows
  • Built interactive dashboards with React, D3.js, and ECharts for security event visualization, implementing real-time updates via WebSocket and optimized rendering for 100K+ data points
  • Engineered backend microservices with Spring Boot and Python FastAPI, integrating Elasticsearch for log aggregation and reducing query latency by 35% through query optimization
  • Implemented authentication and authorization system with JWT tokens, role-based access control (RBAC), and session management for enterprise multi-tenant environment
  • Containerized application stack with Docker and established CI/CD pipelines, automating testing, building, and deployment processes

Backend Engineer Intern

Tencent | Spring Boot, MyBatis, Redis, Docker

Jul 2019 - Aug 2019 | Shenzhen, China

  • Developed backend services for enterprise messaging platform using Spring Boot and MyBatis, implementing RESTful APIs with Redis caching to support high-volume traffic
  • Optimized asynchronous processing with CompletableFuture and reactive programming patterns, achieving 25% throughput improvement and reduced response time by 30%
  • Deployed microservices to Kubernetes (TKE) with health checks, auto-scaling, and monitoring via Prometheus/Grafana, reducing deployment time by 40%

Key Projects & Achievements

LayerwiseAdapter: Multi-Teacher Fusion for Recommendation

PyTorch, CUDA | Jun 2024 - Sep 2024

  • Designed and implemented a novel 3-layer adaptive framework for multi-teacher knowledge fusion in recommendation systems, integrating traditional ML algorithms with LLMs through Fisher-guided knowledge distillation
  • Achieved SOTA performance on MovieLens 1M with RMSE=0.8921, surpassing best individual algorithm AutoInt (0.8910), while maintaining 43.8% parameter reduction
  • Implemented 6 SOTA recommendation algorithms (DeepFM, AutoInt, xDeepFM, DIN, DCNv2, Transformer4Rec) with CUDA optimization, achieving 25x faster inference than LLM baseline
  • Developed Fisher Information-guided importance analysis for intelligent layer selection and pruning-aware knowledge distillation (PAKD), validated with 75% size reduction and 400% speedup

FisherLD: Fisher-Guided Knowledge Distillation for LLM Compression

PyTorch, Transformers | Oct 2024 - Dec 2024

  • Researched and implemented Fisher Information-guided layerwise distillation framework for efficient LLM-based recommendation systems
  • Trained 12-layer Transformer baseline on Amazon Electronics reviews (86K samples) achieving 87.53% test accuracy, then compressed to 6-layer student model
  • Designed layer importance analysis using Fisher Information Matrix and gradient norms, identifying top-6 critical layers with 1577x higher importance for 50% layer reduction
  • Achieved compression breakthrough where student model outperformed teacher (54.9% vs 54.2%) with 43.8% parameter reduction and 44% model size reduction

Interruptr: Game-Theoretic Multi-Agent Code Analysis

Python, OpenAI API, Ollama | Oct 2024 - Nov 2024

  • Architected a game-theory optimized heterogeneous multi-agent system with 3+1 architecture (3 GPT experts + 1 local verifier) for cost-quality Nash equilibrium in code analysis
  • Achieved 22% cost reduction ($0.094 vs $0.120) and 13% quality improvement (F1: 0.85 vs 0.75) compared to single GPT-4, with 44.6% efficiency boost
  • Designed adversarial verification system using lightweight Qwen3 local model (0.5B) as quality detector, detecting 20%+ LLM hallucinations with less than 5% cost overhead
  • Implemented true parallel collaboration with async execution, achieving 3x speedup vs serial execution without quality degradation

Emotion-Aware Language Models with Audio Augmentation

PyTorch, Transformers, PEFT | Mar 2024 - Jul 2024

  • Researched methods for training language models that perceive and respond to emotional context from speech by integrating audio features into LLMs
  • Achieved 48% reduction in cross-speaker degradation (41.7x vs 80.2x) by training on real emotional speech (RAVDESS, 24 speakers) vs TTS-generated data
  • Identified Feature-Generalization Paradox where better in-domain performance inversely correlates with cross-speaker robustness, establishing need for strict cross-speaker evaluation
  • Fine-tuned Qwen2.5-1.5B-Instruct with 4-bit quantization and LoRA on emotional speech datasets

Persona-RAG: Memory-Enhanced Multi-Persona Conversational AI

PyTorch, LangChain, FAISS | Jan 2024 - May 2024

  • Developed lightweight personality-driven conversational AI combining MBTI personality models with memory-enhanced retrieval (RAG)
  • Implemented 3 distinct MBTI personalities using QLoRA fine-tuning on Qwen3-0.6B, achieving 97.3-97.6% token accuracy with only 39MB per personality adapter
  • Designed persona-aware memory retrieval algorithm with weighted scoring, achieving better contextual relevance than standard RAG
  • Built FAISS-based vector search system achieving <0.1s search time through 100+ memories with multi-user support

Music Style Filter PWA

Rust, WebAssembly, React, TypeScript | Sep 2024 - Dec 2024

  • Built production-grade Progressive Web App for real-time audio style filtering with 8 preset styles using DSP algorithms compiled to WebAssembly
  • Implemented audio processing engine in Rust with WASM compilation, achieving 5-10x performance improvement over JavaScript through SIMD optimizations
  • Architected offline-first PWA with service workers and IndexedDB caching, achieving Lighthouse score 95+ across all metrics

Browser ML Inference Engine

Rust, WASM, React, TypeScript | Jan 2024 - Dec 2024

  • Developed production-ready neural network inference engine running entirely in browser using Rust/WebAssembly, supporting 17+ operators
  • Engineered high-performance WASM runtime with SIMD acceleration, Web Workers, and INT8 quantization, achieving 10-50x speedup over JavaScript
  • Optimized binary size to under 1MB through aggressive optimization, tree shaking, and compression

Vector Database with Browser UI

Rust, WASM, React, IndexedDB | May 2024 - Nov 2024

  • Created production-grade browser-based vector database using Rust/WASM with HNSW indexing, supporting 7 distance metrics and persistent storage via IndexedDB
  • Implemented SIMD-accelerated vector operations achieving near-native performance in browser environment
  • Published npm package with full documentation and example applications

Desktop Monitoring Widget

Rust, Tauri 2.x, Svelte 5 | Aug 2024 - Dec 2024

  • Built cross-platform desktop application using Tauri 2.x with Svelte 5 frontend, supporting Windows, macOS, and Linux
  • Developed Rust backend for real-time system metrics collection with minimal performance overhead
  • Achieved sub-100ms launch time and under 50MB memory footprint through optimization

High-Performance Fractal Rendering Engine

C++, CUDA, OpenMP, WebAssembly | Sep 2024 - Dec 2024

  • Designed massively parallel fractal rendering engine achieving 2 billion pixels/sec on dual RTX 3090
  • Optimized CUDA kernels with memory coalescing, delivering 1,216x speedup over CPU baseline and 400x vs OpenMP for 4K resolution (6ms rendering)
  • Developed WebAssembly version using Emscripten for 5-20x faster browser performance than JavaScript

AgentMesh: Distributed Multi-Agent Coordination Framework

Python, Ray, AsyncIO | Aug 2024 - Nov 2024

  • Architected distributed async agent coordination framework using Actor model to solve race conditions for concurrent LLM agents
  • Implemented AST-based semantic conflict detection and three-way merge algorithm, achieving 85% automatic merge success rate
  • Designed priority-based task scheduling supporting 1000+ pending tasks with P99 latency < 500ms

SmartNet: Explainability-Guided Neural Architecture Search

PyTorch, Flask | May 2024 - Aug 2024

  • Developed first platform combining visual network construction with explainability-guided neural architecture search (NAS)
  • Implemented multi-objective NAS optimizing for accuracy + explainability (SHAP/Fisher scores) + efficiency + speed
  • Built modular block system with 6+ reusable components and web-based drag-and-drop interface with real-time code generation

Multi-Robot Communication System

PyTorch, ROS, CUDA | Sep 2020 - Apr 2023

  • Led design and development of multi-agent reinforcement learning system improving robot coordination through distributed control and real-time communication
  • Implemented custom PyTorch-based MARL algorithms (DDPG, MADDPG, QMIX) with CUDA parallelization, achieving 4x faster convergence
  • Integrated ROS middleware for real-time robot communication, supporting SLAM localization and distributed control across heterogeneous platforms
  • Validated system across simulated (Gazebo) and physical robots (TurtleBot), improving task success rate by 30%

Education

M.S. in Computer Science

University of Southern California | Jul 2024 - Dec 2025

Los Angeles, CA

Ph.D. Candidate in Computer Science

University of Chinese Academy of Sciences | Aug 2020 - Mar 2023

Beijing, China

B.S. in Computer Science

Central South University | Sep 2016 - Jun 2020

Changsha, China

Additional Information

Work Authorization: F-1 Student Visa with OPT eligibility

Availability: Available for full-time positions starting January 2026

Languages: English (Fluent), Mandarin Chinese (Native)