论文速览 ‘EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test’
论文速览 ‘DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model’
论文速览 ‘DeepSeek-V3 Technical Report’
论文速览 ‘FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision’
论文速览 ‘FLASHINFER: EFFICIENT AND CUSTOMIZABLE ATTENTION ENGINE FOR LLM INFERENCE SERVING’
论文速览 ‘Training Compute-Optimal Large Language Models’
论文速览 ‘Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer’
论文速览 ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’
论文速览 ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’
2025年技术积累笔记(三)