Massive Models in Low Precision: Power, Limits, and Scaling Laws
Abstract: Modern large language models have billions to trillions of parameters, creating enormous computational and memory costs. Quantization, i.e. reducing their numerical precision, is the leading practical mitigation strategy. But how far can we push it, and what do we lose? This talk addresses different sides of this question. First, for post-training quantization, we characterize the accuracy–compression frontier focusing on large-scale evaluations and new formats. Second, for quantization-aware training, we show that convergence behavior is predicted by representation scaling laws,…



