Summary: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale 06-08
Summary: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving 05-17