🗜️ Why Is This Photo Only 500 KB? — JPEG, Huffman & the Art of Forgetting

Ages: 11–15 · Duration: 105 minutes · Topics: Image representation, Information theory, Lossless & lossy compression, Human visual perception, Discrete Cosine Transform

Part 0 — Warm-Up: What IS a Picture? (10 min)

The Big Idea

"Your phone takes a photo — 12 megapixels, stunning detail. The raw data is 36 million bytes. But the file you share is 2 million bytes. Somehow, 34 million bytes vanished — and nobody can tell. Where did they go? Today, we find out — and the answer involves physics, psychology, and beautiful mathematics."

Let's start with something basic. What do you see?

You see light. Specifically, electromagnetic radiation — waves with wavelengths between about 380 nm (violet) and 700 nm (red). Your eye has three types of cone cells:

Cone type	Peak sensitivity	Colour perceived
S (Short)	~420 nm	Blue-violet
M (Medium)	~530 nm	Green
L (Long)	~560 nm	Red-yellow

Every colour you've ever seen is your brain's interpretation of how strongly these three cone types fired. A pure yellow light (580 nm) stimulates your L and M cones — but so does a mixture of red and green light. Your brain can't tell the difference!

💡 "This is why screens only need three colours. They're not showing you real yellow — they're tricking your cones with a red–green cocktail."

Quick Practice

Question	Answer
How many cone types do humans have?	3 (S, M, L)
Can a screen make real yellow light?	No — it mixes red + green to fake it
Why do TVs use Red, Green, Blue?	Those roughly match our 3 cone types
How many distinct colours can 24-bit RGB represent?	$256^3 = 16{,}777{,}216$

Key Rule

A digital image is a rectangular grid of tiny coloured squares called pixels. Each pixel stores three numbers — one for Red, one for Green, one for Blue — each between 0 and 255. That's all. Every photo you've ever taken is just millions of number triplets.

Part 1 — The Raw Truth: BMP and the Cost of Honesty (≈ 15 min)

Setting the Scene

The simplest image format is BMP (Bitmap). It stores every pixel's RGB values with a small header.

Question: A photo from your phone is 4000 × 3000 pixels. At 3 bytes per pixel, how big is it as a raw BMP?

$$4000 \times 3000 \times 3 = 36{,}000{,}000 \text{ bytes} = 36 \text{ MB}$$

Anatomy of a BMP

┌──────────────────────────────────────────┐
│  BMP Header (14 bytes)                   │
│  File size, offset to pixel data         │
├──────────────────────────────────────────┤
│  DIB Header (40 bytes)                   │
│  Width, height, bits per pixel, etc.     │
├──────────────────────────────────────────┤
│  Pixel Data (width × height × 3 bytes)   │
│  B₁G₁R₁ B₂G₂R₂ B₃G₃R₃ B₄G₄R₄ ...    │
│  (bottom row first, left to right)       │
└──────────────────────────────────────────┘

Fun fact: BMP stores pixels bottom-up and in BGR order (blue first). A quirk from early Windows graphics.

A Tiny Example: A 4×4 Flag

The Key Insight

BMP is perfectly faithful but catastrophically wasteful. The blue pixel (0, 70, 173) appears 12 times. We wrote those same 3 bytes twelve times. Redundancy is an invitation for compression.

Part 2 — The Format Zoo: PNG, GIF, TIFF (≈ 20 min)

The Problem

A 12-megapixel photo is 36 MB as a BMP. We want it smaller — much smaller. But there are two very different philosophies of compression, and each format picks a side.

The Two Philosophies

	Lossless	Lossy
Promise	Get back every single bit	Get back something that looks the same
Analogy	Zipping a Word document	Summarising a novel
Typical ratio	2:1 to 5:1	10:1 to 50:1
Good for	Text, logos, medical scans	Photos, video, music
Formats	PNG, GIF, TIFF, ZIP	JPEG, MP3, H.264

💡 "Lossless compression is like packing a suitcase perfectly — everything fits, and when you unpack, nothing's missing. Lossy compression is like leaving out the socks nobody will see."

🎨 GIF — The 256-Colour Veteran (1987)

Key idea: Reduce the palette to at most 256 colours, then compress with LZW.

Step 1: Colour quantization. A photo might use millions of colours. GIF picks the best 256 and maps every pixel to the nearest one — each pixel is now one byte instead of three.

Step 2: LZW Compression (Lempel-Ziv-Welch). A dictionary-based method — the encoder builds a dictionary of recurring sequences on the fly:

Input:    A B A B A B A B ...
Step 1:   A → code 0         (dictionary: {A:0, B:1})
Step 2:   B → code 1
Step 3:   AB not in dict → add AB:2, output A(0)
Step 4:   BA not in dict → add BA:3, output B(1)
Step 5:   AB found! ABA not in dict → add ABA:4, output AB(2)
...

The decoder can rebuild the exact same dictionary from the compressed data! No dictionary needs to be transmitted.

The killer feature: Animation! GIF supports multiple frames, making it the internet's favourite looping format for decades.

🖼️ PNG — The Lossless Champion (1996)

Key idea: Keep all colours, predict each pixel, compress the residuals.

PNG applies a prediction filter to each row. Instead of storing raw values, it stores the difference between each pixel and a prediction:

Filter	Prediction	Description
None	0	Raw value
Sub	Pixel to the left	Horizontal prediction
Up	Pixel above	Vertical prediction
Average	Mean of left and above	Diagonal compromise
Paeth	Nearest of left, above, upper-left	Adaptive (Alan Paeth)

💡 "The filter doesn't compress anything itself. It transforms the data so the actual compressor (DEFLATE) can work more efficiently. Like pre-chewing food for a baby."

The Format Comparison

Format	Year	Compression	Lossy?	Colours	Transparency	Best for
BMP	1986	None/RLE	No	16M+	No	Nothing (obsolete)
GIF	1987	LZW	No*	256	1-bit	Animations, logos
TIFF	1986	Various	Optional	16M+	Yes	Professional/scientific
PNG	1996	DEFLATE	No	16M+	Full alpha	Screenshots, web graphics
JPEG	1992	DCT+Huffman	Yes	16M	No	Photographs

*GIF's palette reduction is technically lossy, but the compression step itself is lossless.

Dead-End: Why Lossless Isn't Enough

The fundamental limit of lossless compression:

Shannon's Source Coding Theorem (1948) proves you cannot compress data below its entropy — the intrinsic information content. For photographs, the entropy per pixel is high.

No lossless method can compress a typical photo below about 2:1 to 4:1. This is a mathematical law, not a technological limitation.

To reach 10:1 or 50:1, we must lose something. JPEG's genius: it throws away precisely what your brain won't miss.

☕ Break (5 min)

Part 3 — The Science of Not Noticing: Human Visual Perception (≈ 10 min)

Weber's Law & Contrast Sensitivity

JPEG is built on psychophysics — the science of how physical stimuli create perceptual experiences.

Weber's Law (1834): The just-noticeable difference (JND) in a stimulus is proportional to the stimulus magnitude. $$\frac{\Delta I}{I} \approx \text{constant} \approx 0.02 \text{ (for brightness)}$$

In a dark room, you notice a candle.
In a bright room, you need a spotlight for the same perceived change.
A ±2 brightness change on a dark pixel (value 10) is very visible.
A ±2 change on a bright pixel (value 200) is invisible.

Spatial Frequency: Your Eye Has a Resolution Limit

Your eye is not equally sensitive to all detail. Spatial frequency measures how rapidly brightness changes:

Low frequency	Medium frequency	High frequency
Smooth gradients, sky, skin	Edges, textures, hair	Noise, fine detail
👁️ Very sensitive	👁️ Most sensitive	👁️ Least sensitive

At high spatial frequencies (fine detail), your sensitivity drops dramatically. JPEG exploits this ruthlessly.

Colour vs. Brightness: The Chroma Trick

Your eye is far more sensitive to brightness (luminance) than to colour (chrominance). JPEG converts from RGB to YCbCr:

Channel	Meaning	Sensitivity
Y	Luminance (brightness)	★★★★★ High
Cb	Blue-difference chrominance	★★☆☆☆ Low
Cr	Red-difference chrominance	★★☆☆☆ Low

$$Y = 0.299R + 0.587G + 0.114B$$

Notice green contributes the most (0.587) — because your M cones (green-sensitive) are the most numerous!

After conversion, JPEG downsamples the Cb and Cr channels — keeping only one colour sample for every four pixels (4:2:0). This cuts the data by a third, and you can barely tell.

💡 "Imagine telling a painter: 'Use detailed brushwork for light and shadow, but slap the colour on roughly.' That's chroma subsampling."

Solution 1 — Run-Length Encoding: The Simplest Trick 🥉

Setup

The simplest compression: if a value repeats, store the value and the count.

Computation

Compress our flag's first row:

Original:  BLUE  BLUE  YELLOW  BLUE
RLE:       2×BLUE  1×YELLOW  1×BLUE

Representation	Bytes
Original (3 bytes × 4 pixels)	12
RLE: (2, 0,70,173), (1, 254,204,0), (1, 0,70,173)	12

No savings! But the all-yellow row: 4×(254,204,0) = 4 bytes instead of 12 — 3:1 ratio.

Result

$$\boxed{\text{RLE: great for simple graphics, useless for photos}}$$

Solution 2 — Huffman Coding: Nature's ZIP 🥈

Key Observation

In Morse code, the most common letter (E) gets the shortest code (a single dot). Huffman coding does this optimally: short codes for common symbols, long codes for rare ones.

Shannon's Entropy — The Theoretical Floor

Claude Shannon (1948) defined the minimum average bits per symbol:

$$H = -\sum_{i} p_i \log_2 p_i$$

Building a Huffman Tree

The algorithm (greedy, bottom-up):

Create a leaf node for each symbol with its frequency.
Repeat: merge the two lowest-frequency nodes into a parent.
When one node remains, that's the root. Left = 0, right = 1.

The Prefix Property

✋ "If black is '0' and grey is '10', how does the decoder know whether '0' is complete or the start of '10'?"

Huffman codes are prefix-free: no code is a prefix of any other. The decoder reads bits left-to-right and decodes unambiguously — no separators needed!

Result

$$\boxed{\text{Huffman coding: average bits/symbol} \le H + 1\text{, optimal among prefix codes}}$$

Metric	Value
Type	Lossless
Optimality	Best possible prefix-free code
Used by	JPEG, PNG (via DEFLATE), ZIP, MP3
Limitation	Must assign whole-number bit lengths

Solution 3 — The Discrete Cosine Transform: Hearing the Image 🥇

Setup

The DCT looks at an image the way a musician looks at a chord — breaking it into component frequencies.

The Fourier Idea

Joseph Fourier (1807) discovered:

Any periodic function can be written as a sum of sines and cosines.

This works for sound (audio frequencies) and images (spatial frequencies).

The 8×8 Block: JPEG's Canvas

JPEG divides the image into 8×8 blocks (64 pixels each) and transforms each independently. Why 8×8? Small enough to be fast, large enough for meaningful frequency content, and $8 = 2^3$ is friendly for fast algorithms.

The DCT Formula

$$F(u,v) = \frac{1}{4} C(u)\,C(v) \sum_{x=0}^{7} \sum_{y=0}^{7} f(x,y) \cos\!\left[\frac{(2x+1)u\pi}{16}\right] \cos\!\left[\frac{(2y+1)v\pi}{16}\right]$$

where $C(0) = 1/\sqrt{2}$ and $C(k) = 1$ for $k > 0$.

🗣️ "Don't panic! What matters is what it DOES."

The 64 Basis Patterns

Each coefficient $(u,v)$ corresponds to a cosine "wave pattern." Position $(0,0)$ is the DC coefficient (average brightness). Higher positions = finer detail.

🌊 Interactive: DCT Basis Patterns

Each position (u,v) in the 8×8 frequency grid has a unique cosine wave pattern. Drag to explore individual patterns.

u (horiz.): 0

v (vert.): 0

🎛️ Frequency Mixer — Blend 8 DCT Coefficients

Every JPEG 8×8 block is a weighted sum of basis patterns. Drag the amplitudes to build your own block — then try the presets!

DC (0,0):1024

→ (1,0):0

↓ (0,1):0

↘ (1,1):0

═ (2,0):0

║ (0,2):0

▦ (3,3):0

▩ (7,7):0

Presets:

The Key Insight

The DCT doesn't compress anything. It reorganises information so that important parts (low frequencies) separate from unimportant parts (high frequencies). This sets the stage for: throw the unimportant parts away.

💡 "For natural photographs, most high-frequency DCT coefficients are nearly zero. The energy is concentrated in a few low-frequency terms."

Solution 4 — The Full JPEG Pipeline 🎓

The Pipeline

$$\text{RGB} \xrightarrow{1} \text{YCbCr} \xrightarrow{2} \text{Subsample} \xrightarrow{3} \text{DCT (8×8)} \xrightarrow{4} \text{Quantize} \xrightarrow{5} \text{Huffman}$$

Stage 4: Quantization — The Art of Forgetting

Each DCT coefficient is divided by a value from a quantization matrix and rounded:

$$F_q(u,v) = \text{round}\!\left(\frac{F(u,v)}{Q(u,v)}\right)$$

The standard JPEG luminance quantization matrix:

Q = │ 16  11  10  16  24  40  51  61 │
    │ 12  12  14  19  26  58  60  55 │
    │ 14  13  16  24  40  57  69  56 │
    │ 14  17  22  29  51  87  80  62 │
    │ 18  22  37  56  68 109 103  77 │
    │ 24  35  55  64  81 104 113  92 │
    │ 49  64  78  87 103 121 120 101 │
    │ 72  92  95  98 112 100 103  99 │

✋ "Top-left values are small (preserve low frequencies). Bottom-right values are large (destroy high frequencies). This IS the psychovisual model."

Stage 5: Entropy Coding

After quantization, the block is full of zeros. JPEG reads coefficients in zigzag order to group zeros together:

 → 1  2  6  7  15 16 28 29
   3  5  8  14 17 27 30 43
   4  9  13 18 26 31 42 44
   10 12 19 25 32 41 45 54
   11 20 24 33 40 46 53 55
   21 23 34 39 47 52 56 61
   22 35 38 48 51 57 60 62
   36 37 49 50 58 59 63 64

Then it applies RLE on zeros + Huffman coding on the remaining values. The DC coefficient stores only the difference from the previous block.

The Full Pipeline on a Smooth Gradient Block

Stage	Data	Size (bits)
Original	64 pixel values × 8 bits	512
After DCT	64 coefficients (reorganised)	512
After quantize	44, −26, 0, −1, then 60 zeros	~64 values mostly zeros
After zigzag + RLE	(0,44), (0,−26), (1,−1), EOB	4 symbols
After Huffman	Compact bit string	~30

$$\frac{512}{30} \approx 17{:}1 \text{ compression ratio for this block!}$$

When JPEG Fails

JPEG quality ≈ 5%: Almost every DCT coefficient becomes zero except the DC term (average brightness per block). The image degrades into a mosaic of 8×8 solid-colour tiles.

Rule: Never save text, screenshots, or logos as JPEG. Use PNG for sharp edges, JPEG for photos.

Result

$$\boxed{\text{JPEG}: \text{RGB} \to \text{YCbCr} \to \text{subsample} \to \text{DCT} \to \text{quantize} \to \text{Huffman} \implies 10\text{–}50{:}1}$$

Summary — The Complete Picture

The Compression Spectrum

$$\underbrace{\text{BMP}}_{1:1} \longrightarrow \underbrace{\text{RLE}}_{2\text{–}10:1} \longrightarrow \underbrace{\text{Huffman}}_{2\text{–}4:1} \longrightarrow \underbrace{\text{PNG}}_{2\text{–}5:1} \longrightarrow \underbrace{\text{JPEG}}_{10\text{–}50:1}$$

Key Numbers

Fact	Value
Bits per pixel (RGB)	24
12 MP photo as BMP	36 MB
12 MP photo as PNG	~12 MB
12 MP photo as JPEG (Q85)	~3 MB
12 MP photo as JPEG (Q50)	~1.5 MB
Human cone types	3 (S, M, L)
DCT block size	8 × 8 = 64 pixels
Shannon entropy formula	$H = -\sum p_i \log_2 p_i$

Why Multiple Approaches?

Method	Key Idea	Type	Best ratio	Prerequisites
RLE 🥉	Count repeated values	Lossless	~10:1 (graphics)	Counting
Huffman 🥈	Short codes for common symbols	Lossless	~4:1	Probability, trees
DCT + Quantize 🥇	Separate frequencies, discard invisible	Lossy	~20:1	Trig, cosines
Full JPEG 🎓	All of the above, orchestrated	Lossy	~50:1	All above

The Deeper Message

Discipline	Contribution
Physics	Light as EM waves; trichromaticity
Psychophysics	Weber's Law; contrast sensitivity; chroma insensitivity
Information theory	Shannon entropy; limits of lossless compression
Mathematics	Fourier/DCT transforms; Huffman's theorem
Engineering	8×8 blocks; zigzag scan; quantization tables

"JPEG compression is a mathematical theory of what humans don't notice. It's one of the most elegant examples in all of science: understanding perception well enough to exploit it."

Extensions & Challenge Problems 🧩

🥉 Bronze

Byte count. A 1920×1080 image has how many pixels? How many bytes as raw BMP? Express in MB.
Huffman drill. Build a Huffman tree for ABRACADABRA. What are the codes? Average bits/char?
RLE practice. Run-length encode AAABBBCCDDDDDDEE. How many bytes RLE vs. original?

🥈 Silver

Shannon entropy. The letters in MISSISSIPPI have frequencies: M=1, I=4, S=4, P=2. Compute $H$. How close does Huffman get?
YCbCr conversion. Convert RGB = (255, 128, 0) to YCbCr. What does the large Y tell you?
Quantization. Given $F(3,5)=45$ and $Q(3,5)=87$: quantized value? Dequantized? What was lost?

🥇 Gold

Huffman optimality. Prove that in an optimal prefix-free code, the two least-probable symbols have equal length and differ only in the last bit.
DCT energy. For [100, 102, 101, 103, 100, 101, 102, 100], compute $F(0)$ and the fraction of total energy in the DC term.
Why 8×8? What would happen with 4×4 or 16×16 blocks? Discuss frequency resolution, artifacts, and cost.

🎓 Diamond

Gibbs phenomenon. Research why truncating a Fourier series causes overshoot near discontinuities. Can you estimate the percentage?
Arithmetic coding. Explain why it beats Huffman and why JPEG 2000 switched to it.
JPEG vs. wavelets. Why does JPEG 2000's DWT eliminate blocking artifacts?

Answer Key (for instructors)

Bronze

$1920 \times 1080 = 2{,}073{,}600$ pixels. At 3 bytes: $6{,}220{,}800$ bytes $= 5.93$ MB.
Frequencies: A=5, B=2, R=2, C=1, D=1. One valid tree: A=0, B=10, R=110, C=1110, D=1111. Average: $(5 \cdot 1 + 2 \cdot 2 + 2 \cdot 3 + 1 \cdot 4 + 1 \cdot 4)/11 = 23/11 \approx 2.09$ bits/char.
Original: 17 chars = 17 bytes. RLE: (3,A)(3,B)(2,C)(6,D)(2,E) = 10 bytes. Savings: 41%.

Silver

$H = -(1/11)\log_2(1/11) - 2(4/11)\log_2(4/11) - (2/11)\log_2(2/11) \approx 1.823$ bits/char. Huffman (I=0, S=10, P=110, M=111) gives $21/11 = 1.909$ bits/char — about 5% above entropy.
$Y = 0.299(255) + 0.587(128) + 0.114(0) = 151.3$ (bright). $C_b \approx 42.5$ (low blue). $C_r \approx 201.9$ (high red). Makes sense for orange!
$\text{round}(45/87) = 1$. Dequantized: $1 \times 87 = 87$. Error: $|87-45| = 42$. But at position (3,5), that's a high frequency humans can't see.

Gold

Swap argument: if symbol $a$ is rarer than $b$ but has shorter code, swap their codes — average length decreases or stays same, contradicting optimality if it strictly decreases. Applying recursively shows the two rarest must share length and be siblings.
Sum = 809. $F(0) = 809/\sqrt{8} \approx 286.0$. $F(0)^2 = 81{,}796$. Total energy = $\sum f(n)^2 = 81{,}819$. Fraction: $99.97\%$ in DC — extremely smooth signal.
4×4: less frequency resolution, more blocks/overhead, smaller artifacts. 16×16: better resolution but edges within block need many coefficients; larger visible artifacts. 8×8 is the sweet spot.

This lecture was generated for a Math Circle session.

🗜️ Why Is This Photo Only 500 KB? — JPEG, Huffman & the Art of Forgetting

Part 0 — Warm-Up: What IS a Picture? (10 min)

The Big Idea

🎨 Interactive: RGB Colour Mixer

Quick Practice

Key Rule

Part 1 — The Raw Truth: BMP and the Cost of Honesty (≈ 15 min)

Setting the Scene

📐 Interactive: Image Size Calculator

Anatomy of a BMP

A Tiny Example: A 4×4 Flag

🇸🇪 A 4×4 Swedish Flag

The Key Insight

Part 2 — The Format Zoo: PNG, GIF, TIFF (≈ 20 min)

The Problem

The Two Philosophies

🎨 GIF — The 256-Colour Veteran (1987)

🖼️ PNG — The Lossless Champion (1996)

The Format Comparison

Dead-End: Why Lossless Isn't Enough

☕ Break (5 min)

Part 3 — The Science of Not Noticing: Human Visual Perception (≈ 10 min)

Weber's Law & Contrast Sensitivity

👁️ Interactive: Weber's Law Demo

Spatial Frequency: Your Eye Has a Resolution Limit

Colour vs. Brightness: The Chroma Trick

🔄 Interactive: RGB → YCbCr Converter

Solution 1 — Run-Length Encoding: The Simplest Trick 🥉

Setup

Computation

📦 Interactive: Run-Length Encoder

Result

Solution 2 — Huffman Coding: Nature's ZIP 🥈

Key Observation

Shannon's Entropy — The Theoretical Floor

📊 Interactive: Shannon Entropy Calculator

Building a Huffman Tree

🌳 Interactive: Huffman Tree Builder

The Prefix Property

Result

Solution 3 — The Discrete Cosine Transform: Hearing the Image 🥇

Setup

The Fourier Idea

The 8×8 Block: JPEG's Canvas

The DCT Formula

The 64 Basis Patterns

🌊 Interactive: DCT Basis Patterns

🎛️ Frequency Mixer — Blend 8 DCT Coefficients

The Key Insight

Solution 4 — The Full JPEG Pipeline 🎓

The Pipeline

Stage 4: Quantization — The Art of Forgetting

🎚️ Interactive: JPEG Quality Simulator

Stage 5: Entropy Coding

The Full Pipeline on a Smooth Gradient Block

When JPEG Fails

Result

Summary — The Complete Picture

The Compression Spectrum

Key Numbers

Why Multiple Approaches?

The Deeper Message

Extensions & Challenge Problems 🧩

🥉 Bronze

🥈 Silver

🥇 Gold

🎓 Diamond

Answer Key (for instructors)