ACTIVE Research

FisherKD-Unified

Fisher Information-guided adaptive knowledge distillation framework implementing 12+ distillation methods

status ACTIVE
type Research
started 2024-10-01
stack Python PyTorch Transformers WandB

// DESCRIPTION

Fisher Information Knowledge Distillation

FisherKD-Unified introduces a principled knowledge distillation framework grounded in Fisher Information, providing a unified theoretical lens for understanding and optimizing the transfer of knowledge from large teacher models to compact student networks. The method selectively weights the importance of each parameter dimension during distillation, focusing the student on the teacher's most informative feature directions.

The core insight is that Fisher Information captures the curvature of the loss landscape around the teacher's parameters, enabling the student to prioritize learning the representations that matter most for downstream performance. This avoids the well-known pitfall of uniform KD where noisy or redundant teacher signals degrade student quality.

The paper has been accepted at TMLR (Transactions on Machine Learning Research), validating the approach against state-of-the-art distillation baselines across vision and language benchmarks. Results demonstrate consistent improvements in student accuracy with minimal computational overhead beyond standard KD training.

The framework is modular and can be integrated with existing KD pipelines (logit-based, feature-based, or attention-based) as a drop-in importance weighting mechanism, making it practical for production deployment scenarios where model compression is essential.

// HIGHLIGHTS

  • Accepted at TMLR (Transactions on Machine Learning Research)
  • Unified Fisher Information framework for knowledge distillation across logit, feature, and attention transfer
  • Consistent accuracy improvements over vanilla KD on vision (CIFAR-100, ImageNet) and language benchmarks
  • Minimal computational overhead: Fisher weighting adds less than 5% to standard KD training cost
  • Drop-in compatible with existing KD pipelines for practical production deployment