COMMAND_PANEL

Quick Search... ⌘K

NAVIGATION

Home ~ Projects Blog Notes Resume

QUICK ACTIONS

CONNECT

// Made with ❤ by Geoffrey

WIP Research

LinkerLLM

Multi-model LLM serving system with weight section sharing inspired by runtime linkers

VIEW_SOURCE()

// DESCRIPTION

Exploring how runtime linker concepts can enable efficient multi-model serving on limited hardware.

Concept

Traditional: Model A [Weights A] + Model B [Weights B] = 2x Memory

LinkerLLM: Model A [Shared + A-specific]
         + Model B [Shared + B-specific] = ~1.3x Memory

Approach

Weight Section Identification: Find shared vs. unique parameters
Dynamic Loading: Load shared weights once, switch model-specific
Memory Mapping: Efficient weight access patterns

Target Hardware

Dual RTX 3090 (48GB total) serving multiple 7B+ models

// HIGHLIGHTS

Novel runtime linker approach
Multi-model serving research
Experimental framework in progress

started: 2024-11-01

status: WIP

type: Research

< BACK_TO_PROJECTS()