Quantization from the ground up

Overview

An interactive technical essay that explains how quantization reduces the memory requirements of Large Language Models by converting high-precision numbers to lower-precision formats. The piece reveals that outlier values are critical to model performance - removing even a single "super weight" can cause complete model failure.

View Original

The Breakdown

Floating point representation - Shows how numbers are stored in binary using sign, exponent, and significand bits through interactive visualizations
Outlier value preservation - Rare float values outside normal distributions are so important that removing a single one can make models output gibberish
Quantization impact measurement - Uses perplexity and KL divergence metrics to show that 16-bit to 8-bit quantization has almost no quality penalty
Performance vs quality tradeoffs - 16-bit to 4-bit quantization maintains about 90% of original model quality while significantly reducing memory requirements