所有分类 - Wentao's Blog

所有分类

Paper_summary ³¹

Summary: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Summary: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Summary: SGLang: Efficient Execution of Structured Language Model Programs

Summary: MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Summary: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Pytorch ¹⁸

Deep Dive to Pytorch Contiguous Operator(4)

Distributed Training Strategy Introduction

Deep Dive into PyTorch Device Copy Operations

Csapp ¹⁰

CSAPP Class Notes(4)

Server ⁷

Ceph Learning Notes

Nginx Learning Notes

Kafka Learning Notes

Protobuf Learning Notes

Mysql Learning Notes

Tvm ⁵

TVM: 2D Depth Conv GPU Optimization

TVM: GEMM GPU Optimization

TVM: 1D convolution GPU Optimization

TVM: 1D convolution CPU Optimization

Summary: TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Technical_notes ⁴

2025 Technical Notes（3）

2025 Technical Notes（2）

2025 Technical Notes（1）

2024 Technical Notes

Algorithm ²

Understand Dynamic Programming

Understand Lightgbm

Llm ²

Llama_index 源码解读(1)

Llama_index 源码解读(2)

Vllm ²

Bi-weekly Journal: Contributions to vLLM

Summary: Efficient Memory Management for Large Language Model Serving with PagedAttention