Summary: AWQ: Activation-Aware Weight Quantization for on-device LLM Compression and Acceleration 08-30
Summary: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test 07-26