Summary for paper ‘Efficiently Modeling Long Sequences with Structured State Spaces’
论文速览:‘Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms’
论文速览:‘PyTorch: An Imperative Style, High-Performance Deep Learning Library’
论文速览:‘AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration’
论文速览:‘Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve’
论文速览:‘SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills’
论文速览:‘SGLang: Efficient Execution of Structured Language Model Programs’
My bi-weekly journal for contributions to vllm.
论文速览 ‘MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models’
论文速览 ‘EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test’