Wentao's Blog

Bi-weekly Journal: Contributions to vLLM (2026)

yewentao 发布于 2026-05-30 收录于类别 Vllm

My bi-weekly journal for contributions to vllm. (2026)

yewentao 发布于 2026-01-12 收录于类别 Vllm Presentation

Batch invariance的相关介绍

yewentao 发布于 2025-09-28 收录于类别 Paper_summary

Summary for paper ‘Efficiently Modeling Long Sequences with Structured State Spaces’

yewentao 发布于 2025-09-14 收录于类别 Paper_summary

论文速览：‘Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms’

yewentao 发布于 2025-09-06 收录于类别 Paper_summary Pytorch

论文速览：‘PyTorch: An Imperative Style, High-Performance Deep Learning Library’

yewentao 发布于 2025-08-30 收录于类别 Paper_summary

论文速览：‘AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration’

yewentao 发布于 2025-08-24 收录于类别 Paper_summary

论文速览：‘Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve’

yewentao 发布于 2025-08-17 收录于类别 Paper_summary

论文速览：‘SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills’

yewentao 发布于 2025-08-09 收录于类别 Paper_summary

论文速览：‘SGLang: Efficient Execution of Structured Language Model Programs’

yewentao 发布于 2025-08-09 收录于类别 Vllm

My bi-weekly journal for contributions to vllm. (2025)