📱 Scan Me If You Can — From Bits to QR Codes

Ages: 11–14 · Duration: 105 minutes · Topics: Binary encoding, ASCII, Unicode, Error detection & correction, Galois fields, Reed-Solomon codes, QR code structure

Part 0 — Warm-Up: Twenty Questions with a Computer (10 min)

The Big Idea

"You've scanned hundreds of QR codes — restaurant menus, concert tickets, WiFi passwords. But have you ever stopped to ask: how does a camera turn a jumble of tiny squares into a website? Today, we find out — and by the end, you'll know enough to build one by hand."

Let's start with a game you already know.

🗣️ "I'm thinking of a number between 1 and 100. You can ask yes-or-no questions. How many questions do you need to guarantee you find it?"

The trick: cut the possibilities in half each time. Seven questions suffice for 1–100 because $2^7 = 128 > 100$.

Each yes/no answer is one bit of information (from binary digit). A bit has exactly two values: 0 or 1 — like a light switch, a coin flip, or a black-vs-white square.

Quick Practice

Items	Bits needed	Because…
2	1	$2^1 = 2$
4	2	$2^2 = 4$
8	3	$2^3 = 8$
26 letters	5	$2^5 = 32 \ge 26$
256 symbols	8	$2^8 = 256$

Key Rule: $n$ bits can represent $2^n$ different things.
With 8 bits (a byte) we get 256 possibilities — enough for every letter, digit, and punctuation mark on your keyboard.

Part 1 — ASCII: Every Letter Has a Secret Number (≈ 15 min)

Setting the Scene

In 1963, American engineers agreed on a standard code called ASCII — the American Standard Code for Information Interchange. The idea: assign each character a number from 0 to 127.

The One-Bit Trick

Compare 'A' and 'a' bit by bit:

'A' = 0 1 0 0 0 0 0 1   (65)
'a' = 0 1 1 0 0 0 0 1   (97)
            ^
      Only this bit differs!

The difference: $97 - 65 = 32 = 2^5$. Exactly one bit — bit 5 — controls uppercase vs. lowercase.

💡 "This isn't a coincidence. The ASCII designers planned it! To convert any uppercase letter to lowercase, a computer just flips one bit."

Quick Detour: Unicode

Unicode is the modern standard with over 150,000 characters. ASCII is a perfect subset of UTF-8 — everything we learn today still works! The 🎉 emoji takes 4 bytes (32 bits).

The Key Insight:
$$\text{character} \;\longrightarrow\; \text{number (ASCII)} \;\longrightarrow\; \text{bits} \;\longrightarrow\; \text{black/white squares}$$
QR codes are just the last step made physical.

Part 2 — The QR Code Challenge ⭐ (≈ 20 min)

You're given an empty $21 \times 21$ grid. Your mission: fill in the squares to encode the letter 'A' so a phone camera can read it. Then show what changes for 'B' and 'a'.

Anatomy of a QR Code

A Version 1 QR code is a $21 \times 21$ grid of modules. Not all carry data — some are fixed furniture:

Region	Purpose	How to spot it
Finder patterns (×3)	"I'm a QR code!" + orientation	Three big squares in corners
Timing patterns	Ruler — alternating stripes	Row 6 and column 6
Format information	Error level + mask pattern	Adjacent to finders
Data + EC area	Your actual message + error correction	Everything else

The Encoding Recipe — Byte Mode

Field	Bits	Content	For 'A'
Mode indicator	4	`0100` = Byte mode	`0100`
Character count	8	How many characters	`00000001`
Character data	8	The ASCII value	`01000001`
Terminator	4	End marker	`0000`

$$\underbrace{0100}_{\text{mode}} \;\underbrace{00000001}_{\text{count}=1} \;\underbrace{01000001}_{\text{ASCII 65}='A'} \;\underbrace{0000}_{\text{end}}$$

The Big Reveal: Swapping Letters

Char	Byte 1	Byte 2	Byte 3	Bits changed vs. 'A'
A	`0x40`	`0x14`	`0x10`	—
B	`0x40`	`0x14`	`0x20`	2 (byte 3)
a	`0x40`	`0x16`	`0x10`	1 (byte 2)

Dead-End: Life Without Error Correction
A single speck of dirt could flip one module — turning 'A' into 'a', 'C', or 'E'. The scanner has no way to tell something went wrong. Without error correction, a QR code is as fragile as writing in pencil during a rainstorm.

☕ Break (5 min)

Part 3 — Error Correction: Mathematics to the Rescue (≈ 10 min)

The Noisy Channel Problem

Channel	What goes wrong
QR code	Scratches, dirt, camera blur, bad printing
WiFi	Electromagnetic interference
CD/DVD	Physical scratches
Deep space	Cosmic rays (Voyager, 24 billion km away!)

Hamming Distance

The Hamming distance between two bit strings = the number of positions where they differ.

Pair	Hamming distance
A ↔ a	1
A ↔ B	2
A ↔ C	1
A ↔ E	1

Fundamental Rule: To detect $t$ errors → minimum distance $\ge t + 1$.
To correct $t$ errors → minimum distance $\ge 2t + 1$.

QR Error Correction Levels

Level	Recovery	Use case
L	~7%	Clean environment
M	~15%	General use
Q	~25%	Outdoor / industrial
H	~30%	Harsh conditions — logos OK!

Solution 1 — Repetition: Say It Three Times 🥉

The simplest fix: send every bit three times. Decode by majority vote.

Original:   0  1  0  0  0  0  0  1
Tripled:  000 111 000 000 000 000 000 111   (24 bits for 8!)

$$\text{Rate} = \tfrac{1}{3}, \quad \text{corrects 1 error per triplet}$$ Works, but triples your message size. Surely maths can do better?

Solution 2 — Parity Bits: The Cheapest Detector 🥈

Add one extra bit that makes the total number of 1s always even.

For 'A' = 01000001: two 1s (even) → parity bit = 0 → send 01000001 0.

For 'a' = 01100001: three 1s (odd) → parity bit = 1 → send 01100001 1.

If any single bit flips, the total becomes odd → error detected!

Dead-End: Parity can detect 1 error but can't tell you which bit is wrong. And if 2 bits flip simultaneously, parity doesn't notice at all!

$$\text{Rate} = \tfrac{8}{9}, \quad \text{detects 1 error, corrects none}$$

Solution 3 — Hamming(7,4): Find AND Fix 🥇

4 data bits + 3 parity bits = 7 total. Parity bits sit at positions 1, 2, 4 (powers of 2). Each covers positions whose binary representation has a 1 in that column.

Worked Example

Protect the first nibble of 'A' = 0100:

$d_1=0, d_2=1, d_3=0, d_4=0$

Parity bits: $p_1 = 0 \oplus 1 \oplus 0 = 1$, $p_2 = 0 \oplus 0 \oplus 0 = 0$, $p_3 = 1 \oplus 0 \oplus 0 = 1$.

Codeword: 1 0 0 1 1 0 0

If noise flips position 5 → received 1 0 0 1 0 0 0.

Syndrome: $s_3 s_2 s_1 = 101_2 = 5$ → error at position 5! Flip it back. ✓

$$\text{Hamming}(7,4): \;\text{rate} = \tfrac{4}{7} \approx 57\%, \;\text{corrects 1 error per 7 bits}$$

Solution 4 — Reed-Solomon: QR's Polynomial Shield 🎓

Hamming codes correct single-bit errors. But a coffee stain on a QR code doesn't flip one lonely bit — it obliterates entire bytes in a cluster. QR codes need something far more powerful: Reed-Solomon codes.

The Core Insight: Polynomials as Data

Forget bits for a moment. Think about polynomials.

A polynomial of degree $n$ is uniquely determined by $n + 1$ points. A straight line (degree 1) passes through 2 points. A parabola (degree 2) needs 3. If you have more points than the minimum, you can afford to lose some and still recover the curve.

Analogy: Imagine you want to transmit a straight line $y = 3x + 5$. Only 2 points define it — but you send 5 points instead. Even if 1 point gets corrupted, 4 correct points still overdetermine the line. With enough redundancy, you can find and fix the bad point.

Reed-Solomon principle: Treat your data bytes as polynomial coefficients. Evaluate the polynomial at extra points to produce error correction bytes. Send everything. On the receiving end, even if some bytes are damaged, the polynomial can be reconstructed from the surviving points.

Wait — Why Not Regular Numbers?

Here's the catch. Ordinary arithmetic breaks polynomial codes:

Multiplying two bytes can give a result > 255 — it doesn't fit in a byte anymore.
Division often gives non-integers — and we need exact byte values.

We need a number system where:

There are exactly 256 elements (one for each byte: 0x00 – 0xFF).
You can add, subtract, multiply, and divide and always stay within those 256 values.
Division by any nonzero element is exact — no fractions, no remainders.

Such a system exists. Mathematicians call it a Galois field.

GF(256) — The Secret Number System Inside Every QR Code

GF(256) — also written $\mathbb{F}_{256}$ or $\text{GF}(2^8)$ — is a finite field with exactly 256 elements. "GF" stands for Galois Field, named after Évariste Galois, the French mathematician who invented the theory at age 18 (and tragically died in a duel at 20).

Elements: Bytes as Polynomials

Each element of GF(256) is a byte — a value from 0 to 255. But we think of it as a polynomial with binary coefficients of degree ≤ 7:

$$\text{byte } b_7 b_6 b_5 b_4 b_3 b_2 b_1 b_0 \;\longleftrightarrow\; b_7 x^7 + b_6 x^6 + \cdots + b_1 x + b_0$$

For example:

Byte (decimal)	Binary	Polynomial
0	`00000000`	$0$
1	`00000001`	$1$
2	`00000010`	$x$
3	`00000011`	$x + 1$
19	`00010011`	$x^4 + x + 1$
255	`11111111`	$x^7 + x^6 + x^5 + x^4 + x^3 + x^2 + x + 1$

Addition in GF(256): XOR

Addition is the simplest part — it's just bitwise XOR! Since coefficients are binary (0 or 1) and $1 + 1 = 0$ in binary arithmetic (no carrying!), adding two polynomials is pure XOR:

$$19 \oplus 3 = \texttt{00010011} \oplus \texttt{00000011} = \texttt{00010000} = 16$$

In polynomial form: $(x^4 + x + 1) + (x + 1) = x^4$. The $x$ and constant terms cancel!

💡 In GF(256), addition = subtraction = XOR. Every element is its own additive inverse: $a + a = 0$ for all $a$.

Multiplication in GF(256): Polynomial Multiplication mod a Special Polynomial

Multiplication is trickier. We multiply the polynomials normally, but then reduce modulo an irreducible polynomial of degree 8:

$$p(x) = x^8 + x^4 + x^3 + x^2 + 1 \quad \text{(binary: \texttt{100011101} = 0x11D)}$$

This polynomial is to GF(256) what a prime number is to modular arithmetic — it can't be factored, and dividing by it gives well-behaved remainders.

Example: Multiply $3 \times 7$ in GF(256).

$3 = x + 1$ and $7 = x^2 + x + 1$.

$$ (x + 1)(x^2 + x + 1) = x^3 + x^2 + x + x^2 + x + 1 = x^3 + 1 = 9 $$

(Remember: $x^2 + x^2 = 0$ and $x + x = 0$ — they cancel.) The result is 9, already below degree 8, so no reduction needed.

When do we reduce? When the product has degree ≥ 8, we divide by $p(x)$ and keep the remainder — just like clock arithmetic, but with polynomials. This guarantees the result is always a valid byte (0–255).

The Generator Element $\alpha = 2$

GF(256) has a special property: the element $\alpha = 2$ (i.e., $x$) is a generator. If you keep multiplying by 2:

$$\alpha^0 = 1, \quad \alpha^1 = 2, \quad \alpha^2 = 4, \quad \alpha^3 = 8, \quad \ldots$$

The powers cycle through all 255 nonzero elements before returning to 1 at $\alpha^{255} = 1$. This means:

Every nonzero byte can be written as $\alpha^k$ for some $k$ from 0 to 254.
Multiplication becomes addition of exponents: $\alpha^i \cdot \alpha^j = \alpha^{(i+j) \bmod 255}$.
Division becomes subtraction of exponents: $\alpha^i / \alpha^j = \alpha^{(i-j) \bmod 255}$.

This is how QR decoders multiply efficiently: just look up the exponents in a table, add them, and look up the result. A $256 \times 256$ multiplication table compressed into two 256-entry tables!

Why 256?

This is no coincidence. $256 = 2^8$, and one byte is 8 bits. By working in GF($2^8$), every field element fits in exactly one byte. No overflow, no fractions, no wasted space. This is why Reed-Solomon is so natural for digital data: the math and the hardware speak the same language.

The GF(256) cheat sheet:

• 256 elements = bytes 0–255

• Addition = XOR (fast!)

• Multiplication = polynomial multiplication mod $x^8 + x^4 + x^3 + x^2 + 1$

• Every nonzero element = power of $\alpha = 2$

• Division always exact — no fractions

Reed-Solomon Encoding: Step by Step

Now we have our number system. Here's how Reed-Solomon actually protects a QR message.

Step 1: Form the Message Polynomial

Your $k$ data codewords $d_0, d_1, \ldots, d_{k-1}$ become coefficients of a polynomial:

$$m(x) = d_0 \, x^{k-1} + d_1 \, x^{k-2} + \cdots + d_{k-1}$$

For a Version 1-L QR code: $k = 19$ data codewords, so $m(x)$ has degree 18.

Step 2: Build the Generator Polynomial

To produce $t$ error correction codewords, we build a generator polynomial $g(x)$ of degree $t$:

$$g(x) = (x - \alpha^0)(x - \alpha^1)(x - \alpha^2) \cdots (x - \alpha^{t-1})$$

Remember: subtraction = XOR in GF(256), so $x - \alpha^i = x + \alpha^i$ here. For Version 1-L with $t = 7$:

$$g(x) = (x + 1)(x + 2)(x + 4)(x + 8)(x + 16)(x + 32)(x + 64)$$

Multiplying these out (in GF(256) arithmetic) gives a degree-7 polynomial with known coefficients.

Step 3: Polynomial Long Division

Shift $m(x)$ up by multiplying by $x^t$, then divide by $g(x)$:

$$x^t \cdot m(x) = q(x) \cdot g(x) + r(x)$$

The remainder $r(x)$ has degree < $t$ — its coefficients are your error correction codewords!

Step 4: Transmit Everything

The full transmitted codeword is:

$$c(x) = x^t \cdot m(x) - r(x)$$

By construction, $g(x)$ divides $c(x)$ evenly. The receiver checks this: if the remainder isn't zero, errors occurred — and the math can pinpoint exactly which bytes are wrong.

Worked Example: Encoding "A" with 4 EC Bytes

Let's trace a small example with $k = 1$ data byte and $t = 4$ EC bytes.

Data: ASCII 65 = 0x41. Message polynomial: $m(x) = 65$.

Generator: $g(x) = (x + 1)(x + 2)(x + 4)(x + 8)$ — a degree-4 polynomial in GF(256).

Shift: $x^4 \cdot m(x) = 65\,x^4$.

Divide $65\,x^4$ by $g(x)$ in GF(256) arithmetic → remainder $r(x) = r_3 x^3 + r_2 x^2 + r_1 x + r_0$.

Those four remainder coefficients are the error correction bytes. The transmitted block is: [65, r₃, r₂, r₁, r₀].

Decoding: Finding and Fixing Errors

When the receiver gets a (possibly corrupted) message, the Reed-Solomon decoder:

Computes syndromes: Evaluate the received polynomial at $\alpha^0, \alpha^1, \ldots, \alpha^{t-1}$. If all zero → no errors!
Locates errors: Uses the Berlekamp-Massey algorithm (or Euclid's algorithm) to find which byte positions are wrong.
Fixes them: Computes the correct values using Forney's algorithm.

Error correction capacity: With $t$ EC codewords, Reed-Solomon can correct up to $\lfloor t/2 \rfloor$ corrupted codewords, or detect up to $t$ corrupted codewords. If the positions of errors are already known ("erasures"), it can recover up to $t$ of them.

QR Code EC in Practice

Here's what each QR error correction level allocates for Version 1 (21×21, 26 total codewords):

Level	Data codewords	EC codewords	Correctable bytes	Recovery
L	19	7	3	~7%
M	16	10	5	~15%
Q	13	13	6	~25%
H	9	17	8	~30%

At level H, 17 out of 26 codewords are error correction — you can obliterate nearly a third of the code and it still scans. This is why companies can slap logos right in the middle of QR codes!

Mind-Bending Fact: Hamming(7,4) corrects one bit per 7-bit block. Reed-Solomon over GF(256) corrects entire bytes — and it doesn't matter whether 1 bit or all 8 bits of that byte are wrong. A byte is a byte. This makes RS vastly superior against burst errors, where consecutive bits get damaged (scratches, stains, camera blur).

$$\text{Reed-Solomon over GF}(256): \text{ corrects burst errors at byte level — powers every QR code on Earth}$$

Summary

Encoding Chain

$$\text{Char} \xrightarrow{\text{ASCII}} \text{Number} \xrightarrow{\text{Binary}} \text{Bits} \xrightarrow{\text{QR}} \text{Codewords} \xrightarrow{\text{RS}} \text{Protected} \xrightarrow{\text{Grid}} \text{QR Code}$$

Method Comparison

Method	Rate	Corrects	Used by
Repetition 🥉	$\frac{1}{3}$	1 per triplet	(Toy example)
Parity 🥈	$\frac{8}{9}$	Detects only	Credit cards, ISBN
Hamming(7,4) 🥇	$\frac{4}{7}$	1 per block	ECC memory
Reed-Solomon 🎓	Varies	Burst errors	QR, CD, DVD, Voyager

Extensions & Challenge Problems 🧩

🥉 Bronze

Write your full name in ASCII binary. How many bits?
Compute $d(\text{'H'}, \text{'h'})$ and $d(\text{'0'}, \text{'9'})$. Which pair is more vulnerable?
Add an even-parity bit to: 1010101, 1111111, 0000000.

🥈 Silver

Write the full QR byte-mode bit stream for "Hi" (two characters). How many data codewords?
Using 4-bit words, find a set of codewords with minimum Hamming distance 3. What's the maximum size?
Version 1-H uses 9 data codewords. How many ASCII chars fit in byte mode?

🥇 Gold

Verify the Hamming bound for Hamming(7,4) with $t=1$: $\frac{2^7}{1+\binom{7}{1}} = 16 = 2^4$. Is it tight?
Extend Hamming(7,4) with an 8th overall parity bit (SECDED). Show it distinguishes 1 vs. 2 errors.

🎓 Diamond

QR Quine: Can a QR code encode its own pixel data? Version 1 has 441 modules but storing 441 bits requires more capacity than Version 1 provides. What's the smallest self-referential version?
Shannon's promise: For a binary channel with 5% error rate, $C = 1 - H(0.05) \approx 0.71$. What does this mean for QR code design?

Answer Key

Bronze Answers

"Alice" = 5 chars × 8 bits = 40 bits.
$d(\text{'H'},\text{'h'}) = 1$; $d(\text{'0'},\text{'9'}) = 2$. Letters are more vulnerable.
1010101 0 (4 ones), 1111111 1 (8 ones), 0000000 0 (0 ones).

Silver Answers

Stream: 0100 00000010 01001000 01101001 0000 = 32 bits = 4 codewords.
Hamming bound: $2^4/(1+4) = 3.2$, so at most 3 codewords. Example: {0000, 1110, 0111}.
9 codewords = 72 bits. Overhead: 4+8+4 = 16 bits. Data: 56 bits → 7 characters.

Gold Answers

$128/8 = 16 = 2^4$. Equality holds — Hamming(7,4) is a perfect code!
Syndrome ≠ 0 + odd overall parity → 1 error (correct). Syndrome ≠ 0 + even overall parity → 2 errors (report).

← Back to all lectures