Wentao's Blog

Summary: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

yewentao published on 2025-06-08 included in category Paper_summary

Summary for paper ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’

yewentao published on 2025-05-28 included in category Paper_summary

Summary for paper ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’

yewentao published on 2025-05-23 included in category Technical_notes

Technical notes during 2025 (3).

yewentao published on 2025-05-22 included in category Paper_summary

Summary for paper ‘FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness’

yewentao published on 2025-05-17 included in category Paper_summary

Summary for paper ‘DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving’

yewentao published on 2025-05-11 included in category Paper_summary

Summary for paper ‘Fast Inference from Transformers via Speculative Decoding’

yewentao published on 2025-04-29 included in category Paper_summary

Summary for paper ‘MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation’

yewentao published on 2025-04-27 included in category Paper_summary

Summary for paper ‘Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing’

yewentao published on 2025-04-17 included in categories Paper_summary Vllm

Summary for paper ‘Efficient Memory Management for Large Language Model Serving with PagedAttention’

yewentao published on 2025-04-13 included in category Paper_summary

Summary for paper ‘Incentivizing Reasoning Capability in LLMs via Reinforcement Learning’