Summary for paper ‘DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model’
Summary for paper ‘DeepSeek-V3 Technical Report’
Summary for paper ‘FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision’
Summary for paper ‘FLASHINFER: EFFICIENT AND CUSTOMIZABLE ATTENTION ENGINE FOR LLM INFERENCE SERVING’
Summary for paper ‘Training Compute-Optimal Large Language Models’
Summary for paper ‘Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer’
Summary for paper ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’
Summary for paper ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’
Technical notes during 2025 (3).
Summary for paper ‘FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness’