WIP
Research
LinkerLLM
Multi-model LLM serving system with weight section sharing inspired by runtime linkers
// DESCRIPTION
Exploring how runtime linker concepts can enable efficient multi-model serving on limited hardware.
Concept
Traditional: Model A [Weights A] + Model B [Weights B] = 2x Memory
LinkerLLM: Model A [Shared + A-specific]
+ Model B [Shared + B-specific] = ~1.3x Memory
Approach
- Weight Section Identification: Find shared vs. unique parameters
- Dynamic Loading: Load shared weights once, switch model-specific
- Memory Mapping: Efficient weight access patterns
Target Hardware
Dual RTX 3090 (48GB total) serving multiple 7B+ models
// HIGHLIGHTS
- Novel runtime linker approach
- Multi-model serving research
- Experimental framework in progress