Summary for paper ‘Training Compute-Optimal Large Language Models’
Summary for paper ‘Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer’
Summary for paper ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’
Summary for paper ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’
Technical notes during 2025 (3).
Summary for paper ‘FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness’
Summary for paper ‘DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving’
Summary for paper ‘Fast Inference from Transformers via Speculative Decoding’
Summary for paper ‘MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation’
Summary for paper ‘Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing’