Summary: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test 07-26
Summary: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale 06-08