论文速览 ‘Training Compute-Optimal Large Language Models’
论文速览 ‘Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer’
论文速览 ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’
论文速览 ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’
2025年技术积累笔记(三)
论文速览 ‘FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness’
论文速览 ‘DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving’
论文速览:‘Fast Inference from Transformers via Speculative Decoding’
论文速览:‘MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation’
论文速览:‘Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing’