Paper_summary - Category - Wentao's Blog

Paper_summary

2025

Summary: Efficiently Modeling Long Sequences with Structured State Spaces 09-28

Summary: Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms 09-14

Summary: PyTorch: An Imperative Style, High-Performance Deep Learning Library 09-06

Summary: AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration 08-30

Summary: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve 08-24

Summary: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills 08-17

Summary: SGLang: Efficient Execution of Structured Language Model Programs 08-09

Summary: MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models 08-03

Summary: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test 07-26

Summary: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 07-20

1
2
3
4