Wentao's Blog

Summary: Training Compute-Optimal Large Language Models

yewentao 发布于 2025-06-22 收录于类别 Paper_summary

论文速览 ‘Training Compute-Optimal Large Language Models’

yewentao 发布于 2025-06-15 收录于类别 Paper_summary

论文速览 ‘Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer’

yewentao 发布于 2025-06-08 收录于类别 Paper_summary

论文速览 ‘DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale’

yewentao 发布于 2025-05-28 收录于类别 Paper_summary

论文速览 ‘FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning’

yewentao 发布于 2025-05-24 收录于类别 Technical_notes

2025年技术积累笔记（三）

yewentao 发布于 2025-05-22 收录于类别 Paper_summary

论文速览 ‘FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness’

yewentao 发布于 2025-05-17 收录于类别 Paper_summary

论文速览 ‘DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving’

yewentao 发布于 2025-05-11 收录于类别 Paper_summary

论文速览：‘Fast Inference from Transformers via Speculative Decoding’

yewentao 发布于 2025-04-29 收录于类别 Paper_summary

论文速览：‘MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation’

yewentao 发布于 2025-04-27 收录于类别 Paper_summary

论文速览：‘Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing’