All Posts - Wentao's Blog

All Posts

2026

Batch Invariance Introduction in vLLM 01-12

2025

Summary: Efficiently Modeling Long Sequences with Structured State Spaces 09-28

Summary: Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms 09-14

Summary: PyTorch: An Imperative Style, High-Performance Deep Learning Library 09-06

Summary: AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration 08-30

Summary: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve 08-24

Summary: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills 08-17

Summary: SGLang: Efficient Execution of Structured Language Model Programs 08-09

Bi-weekly Journal: Contributions to vLLM 08-09

Summary: MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models 08-03

1
2
3
9