Ages: 11β14 Β· Duration: 105 minutes Β· Topics: Binary encoding, ASCII, Unicode, Error detection & correction, Galois fields, Reed-Solomon codes, QR code structure
"You've scanned hundreds of QR codes β restaurant menus, concert tickets, WiFi passwords. But have you ever stopped to ask: how does a camera turn a jumble of tiny squares into a website? Today, we find out β and by the end, you'll know enough to build one by hand."
Let's start with a game you already know.
π£οΈ "I'm thinking of a number between 1 and 100. You can ask yes-or-no questions. How many questions do you need to guarantee you find it?"
The trick: cut the possibilities in half each time. Seven questions suffice for 1β100 because $2^7 = 128 > 100$.
Each yes/no answer is one bit of information (from binary digit). A bit has exactly two values: 0 or 1 β like a light switch, a coin flip, or a black-vs-white square.
| Items | Bits needed | Because⦠|
|---|---|---|
| 2 | 1 | $2^1 = 2$ |
| 4 | 2 | $2^2 = 4$ |
| 8 | 3 | $2^3 = 8$ |
| 26 letters | 5 | $2^5 = 32 \ge 26$ |
| 256 symbols | 8 | $2^8 = 256$ |
In 1963, American engineers agreed on a standard code called ASCII β the American Standard Code for Information Interchange. The idea: assign each character a number from 0 to 127.
Compare 'A' and 'a' bit by bit:
'A' = 0 1 0 0 0 0 0 1 (65)
'a' = 0 1 1 0 0 0 0 1 (97)
^
Only this bit differs!
The difference: $97 - 65 = 32 = 2^5$. Exactly one bit β bit 5 β controls uppercase vs. lowercase.
Unicode is the modern standard with over 150,000 characters. ASCII is a perfect subset of UTF-8 β everything we learn today still works! The π emoji takes 4 bytes (32 bits).
You're given an empty $21 \times 21$ grid. Your mission: fill in the squares to encode the letter 'A' so a phone camera can read it. Then show what changes for 'B' and 'a'.
A Version 1 QR code is a $21 \times 21$ grid of modules. Not all carry data β some are fixed furniture:
| Region | Purpose | How to spot it |
|---|---|---|
| Finder patterns (Γ3) | "I'm a QR code!" + orientation | Three big squares in corners |
| Timing patterns | Ruler β alternating stripes | Row 6 and column 6 |
| Format information | Error level + mask pattern | Adjacent to finders |
| Data + EC area | Your actual message + error correction | Everything else |
| Field | Bits | Content | For 'A' |
|---|---|---|---|
| Mode indicator | 4 | 0100 = Byte mode | 0100 |
| Character count | 8 | How many characters | 00000001 |
| Character data | 8 | The ASCII value | 01000001 |
| Terminator | 4 | End marker | 0000 |
| Char | Byte 1 | Byte 2 | Byte 3 | Bits changed vs. 'A' |
|---|---|---|---|---|
| A | 0x40 | 0x14 | 0x10 | β |
| B | 0x40 | 0x14 | 0x20 | 2 (byte 3) |
| a | 0x40 | 0x16 | 0x10 | 1 (byte 2) |
| Channel | What goes wrong |
|---|---|
| QR code | Scratches, dirt, camera blur, bad printing |
| WiFi | Electromagnetic interference |
| CD/DVD | Physical scratches |
| Deep space | Cosmic rays (Voyager, 24 billion km away!) |
The Hamming distance between two bit strings = the number of positions where they differ.
| Pair | Hamming distance |
|---|---|
| A β a | 1 |
| A β B | 2 |
| A β C | 1 |
| A β E | 1 |
| Level | Recovery | Use case |
|---|---|---|
| L | ~7% | Clean environment |
| M | ~15% | General use |
| Q | ~25% | Outdoor / industrial |
| H | ~30% | Harsh conditions β logos OK! |
The simplest fix: send every bit three times. Decode by majority vote.
Original: 0 1 0 0 0 0 0 1 Tripled: 000 111 000 000 000 000 000 111 (24 bits for 8!)
Add one extra bit that makes the total number of 1s always even.
For 'A' = 01000001: two 1s (even) β parity bit = 0 β send 01000001 0.
For 'a' = 01100001: three 1s (odd) β parity bit = 1 β send 01100001 1.
If any single bit flips, the total becomes odd β error detected!
4 data bits + 3 parity bits = 7 total. Parity bits sit at positions 1, 2, 4 (powers of 2). Each covers positions whose binary representation has a 1 in that column.
Protect the first nibble of 'A' = 0100:
$d_1=0, d_2=1, d_3=0, d_4=0$
Parity bits: $p_1 = 0 \oplus 1 \oplus 0 = 1$, $p_2 = 0 \oplus 0 \oplus 0 = 0$, $p_3 = 1 \oplus 0 \oplus 0 = 1$.
Codeword: 1 0 0 1 1 0 0
If noise flips position 5 β received 1 0 0 1 0 0 0.
Syndrome: $s_3 s_2 s_1 = 101_2 = 5$ β error at position 5! Flip it back. β
Hamming codes correct single-bit errors. But a coffee stain on a QR code doesn't flip one lonely bit β it obliterates entire bytes in a cluster. QR codes need something far more powerful: Reed-Solomon codes.
Forget bits for a moment. Think about polynomials.
A polynomial of degree $n$ is uniquely determined by $n + 1$ points. A straight line (degree 1) passes through 2 points. A parabola (degree 2) needs 3. If you have more points than the minimum, you can afford to lose some and still recover the curve.
Analogy: Imagine you want to transmit a straight line $y = 3x + 5$. Only 2 points define it β but you send 5 points instead. Even if 1 point gets corrupted, 4 correct points still overdetermine the line. With enough redundancy, you can find and fix the bad point.
Here's the catch. Ordinary arithmetic breaks polynomial codes:
We need a number system where:
Such a system exists. Mathematicians call it a Galois field.
GF(256) β also written $\mathbb{F}_{256}$ or $\text{GF}(2^8)$ β is a finite field with exactly 256 elements. "GF" stands for Galois Field, named after Γvariste Galois, the French mathematician who invented the theory at age 18 (and tragically died in a duel at 20).
Each element of GF(256) is a byte β a value from 0 to 255. But we think of it as a polynomial with binary coefficients of degree β€ 7:
$$\text{byte } b_7 b_6 b_5 b_4 b_3 b_2 b_1 b_0 \;\longleftrightarrow\; b_7 x^7 + b_6 x^6 + \cdots + b_1 x + b_0$$For example:
| Byte (decimal) | Binary | Polynomial |
|---|---|---|
| 0 | 00000000 | $0$ |
| 1 | 00000001 | $1$ |
| 2 | 00000010 | $x$ |
| 3 | 00000011 | $x + 1$ |
| 19 | 00010011 | $x^4 + x + 1$ |
| 255 | 11111111 | $x^7 + x^6 + x^5 + x^4 + x^3 + x^2 + x + 1$ |
Addition is the simplest part β it's just bitwise XOR! Since coefficients are binary (0 or 1) and $1 + 1 = 0$ in binary arithmetic (no carrying!), adding two polynomials is pure XOR:
$$19 \oplus 3 = \texttt{00010011} \oplus \texttt{00000011} = \texttt{00010000} = 16$$In polynomial form: $(x^4 + x + 1) + (x + 1) = x^4$. The $x$ and constant terms cancel!
Multiplication is trickier. We multiply the polynomials normally, but then reduce modulo an irreducible polynomial of degree 8:
$$p(x) = x^8 + x^4 + x^3 + x^2 + 1 \quad \text{(binary: \texttt{100011101} = 0x11D)}$$This polynomial is to GF(256) what a prime number is to modular arithmetic β it can't be factored, and dividing by it gives well-behaved remainders.
Example: Multiply $3 \times 7$ in GF(256).
$3 = x + 1$ and $7 = x^2 + x + 1$.
$$ (x + 1)(x^2 + x + 1) = x^3 + x^2 + x + x^2 + x + 1 = x^3 + 1 = 9 $$(Remember: $x^2 + x^2 = 0$ and $x + x = 0$ β they cancel.) The result is 9, already below degree 8, so no reduction needed.
When do we reduce? When the product has degree β₯ 8, we divide by $p(x)$ and keep the remainder β just like clock arithmetic, but with polynomials. This guarantees the result is always a valid byte (0β255).
GF(256) has a special property: the element $\alpha = 2$ (i.e., $x$) is a generator. If you keep multiplying by 2:
$$\alpha^0 = 1, \quad \alpha^1 = 2, \quad \alpha^2 = 4, \quad \alpha^3 = 8, \quad \ldots$$The powers cycle through all 255 nonzero elements before returning to 1 at $\alpha^{255} = 1$. This means:
This is how QR decoders multiply efficiently: just look up the exponents in a table, add them, and look up the result. A $256 \times 256$ multiplication table compressed into two 256-entry tables!
This is no coincidence. $256 = 2^8$, and one byte is 8 bits. By working in GF($2^8$), every field element fits in exactly one byte. No overflow, no fractions, no wasted space. This is why Reed-Solomon is so natural for digital data: the math and the hardware speak the same language.
Now we have our number system. Here's how Reed-Solomon actually protects a QR message.
Your $k$ data codewords $d_0, d_1, \ldots, d_{k-1}$ become coefficients of a polynomial:
$$m(x) = d_0 \, x^{k-1} + d_1 \, x^{k-2} + \cdots + d_{k-1}$$For a Version 1-L QR code: $k = 19$ data codewords, so $m(x)$ has degree 18.
To produce $t$ error correction codewords, we build a generator polynomial $g(x)$ of degree $t$:
$$g(x) = (x - \alpha^0)(x - \alpha^1)(x - \alpha^2) \cdots (x - \alpha^{t-1})$$Remember: subtraction = XOR in GF(256), so $x - \alpha^i = x + \alpha^i$ here. For Version 1-L with $t = 7$:
$$g(x) = (x + 1)(x + 2)(x + 4)(x + 8)(x + 16)(x + 32)(x + 64)$$Multiplying these out (in GF(256) arithmetic) gives a degree-7 polynomial with known coefficients.
Shift $m(x)$ up by multiplying by $x^t$, then divide by $g(x)$:
$$x^t \cdot m(x) = q(x) \cdot g(x) + r(x)$$The remainder $r(x)$ has degree < $t$ β its coefficients are your error correction codewords!
The full transmitted codeword is:
$$c(x) = x^t \cdot m(x) - r(x)$$By construction, $g(x)$ divides $c(x)$ evenly. The receiver checks this: if the remainder isn't zero, errors occurred β and the math can pinpoint exactly which bytes are wrong.
Let's trace a small example with $k = 1$ data byte and $t = 4$ EC bytes.
Data: ASCII 65 = 0x41. Message polynomial: $m(x) = 65$.
Generator: $g(x) = (x + 1)(x + 2)(x + 4)(x + 8)$ β a degree-4 polynomial in GF(256).
Shift: $x^4 \cdot m(x) = 65\,x^4$.
Divide $65\,x^4$ by $g(x)$ in GF(256) arithmetic β remainder $r(x) = r_3 x^3 + r_2 x^2 + r_1 x + r_0$.
Those four remainder coefficients are the error correction bytes. The transmitted block is: [65, rβ, rβ, rβ, rβ].
When the receiver gets a (possibly corrupted) message, the Reed-Solomon decoder:
Here's what each QR error correction level allocates for Version 1 (21Γ21, 26 total codewords):
| Level | Data codewords | EC codewords | Correctable bytes | Recovery |
|---|---|---|---|---|
| L | 19 | 7 | 3 | ~7% |
| M | 16 | 10 | 5 | ~15% |
| Q | 13 | 13 | 6 | ~25% |
| H | 9 | 17 | 8 | ~30% |
At level H, 17 out of 26 codewords are error correction β you can obliterate nearly a third of the code and it still scans. This is why companies can slap logos right in the middle of QR codes!
| Method | Rate | Corrects | Used by |
|---|---|---|---|
| Repetition π₯ | $\frac{1}{3}$ | 1 per triplet | (Toy example) |
| Parity π₯ | $\frac{8}{9}$ | Detects only | Credit cards, ISBN |
| Hamming(7,4) π₯ | $\frac{4}{7}$ | 1 per block | ECC memory |
| Reed-Solomon π | Varies | Burst errors | QR, CD, DVD, Voyager |
1010101, 1111111, 0000000.1010101 0 (4 ones), 1111111 1 (8 ones), 0000000 0 (0 ones).0100 00000010 01001000 01101001 0000 = 32 bits = 4 codewords.