Wentao's Blog

Summary: Efficiently Modeling Long Sequences with Structured State Spaces

yewentao published on 2025-09-28 included in category Paper_summary

Summary for paper ‘Efficiently Modeling Long Sequences with Structured State Spaces’

yewentao published on 2025-09-14 included in category Paper_summary

Summary for paper ‘Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms’

yewentao published on 2025-09-06 included in categories Paper_summary Pytorch

Summary for paper ‘PyTorch: An Imperative Style, High-Performance Deep Learning Library’

yewentao published on 2025-08-30 included in category Paper_summary

Summary for paper ‘AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration’

yewentao published on 2025-08-24 included in category Paper_summary

Summary for paper ‘Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve’

yewentao published on 2025-08-17 included in category Paper_summary

Summary for paper ‘SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills’

yewentao published on 2025-08-09 included in category Paper_summary

Summary for paper ‘SGLang: Efficient Execution of Structured Language Model Programs’

yewentao published on 2025-08-09 included in category Vllm

My bi-weekly journal for contributions to vllm.

yewentao published on 2025-08-03 included in category Paper_summary

Summary for paper ‘MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models’

yewentao published on 2025-07-26 included in category Paper_summary

Summary for paper ‘EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test’