// Research
COMPLETED
Research
> Infer-GC
Garbage Collection-style memory management for LLM inference achieving 37-52% GPU memory reduction
Python
PyTorch
CUDA
Transformers