πŸ“± Scan Me If You Can β€” From Bits to QR Codes

Ages: 11–14  Β·  Duration: 105 minutes  Β·  Topics: Binary encoding, ASCII, Unicode, Error detection & correction, Galois fields, Reed-Solomon codes, QR code structure


Part 0 β€” Warm-Up: Twenty Questions with a Computer (10 min)

The Big Idea

"You've scanned hundreds of QR codes β€” restaurant menus, concert tickets, WiFi passwords. But have you ever stopped to ask: how does a camera turn a jumble of tiny squares into a website? Today, we find out β€” and by the end, you'll know enough to build one by hand."

Let's start with a game you already know.

πŸ—£οΈ "I'm thinking of a number between 1 and 100. You can ask yes-or-no questions. How many questions do you need to guarantee you find it?"

The trick: cut the possibilities in half each time. Seven questions suffice for 1–100 because $2^7 = 128 > 100$.

Each yes/no answer is one bit of information (from binary digit). A bit has exactly two values: 0 or 1 β€” like a light switch, a coin flip, or a black-vs-white square.

Quick Practice

ItemsBits neededBecause…
21$2^1 = 2$
42$2^2 = 4$
83$2^3 = 8$
26 letters5$2^5 = 32 \ge 26$
256 symbols8$2^8 = 256$
Key Rule: $n$ bits can represent $2^n$ different things. With 8 bits (a byte) we get 256 possibilities β€” enough for every letter, digit, and punctuation mark on your keyboard.

Part 1 β€” ASCII: Every Letter Has a Secret Number (β‰ˆ 15 min)

Setting the Scene

In 1963, American engineers agreed on a standard code called ASCII β€” the American Standard Code for Information Interchange. The idea: assign each character a number from 0 to 127.

πŸ”€ Interactive: ASCII ↔ Binary Converter

⟷

The One-Bit Trick

Compare 'A' and 'a' bit by bit:

'A' = 0 1 0 0 0 0 0 1   (65)
'a' = 0 1 1 0 0 0 0 1   (97)
            ^
      Only this bit differs!

The difference: $97 - 65 = 32 = 2^5$. Exactly one bit β€” bit 5 β€” controls uppercase vs. lowercase.

πŸ’‘ "This isn't a coincidence. The ASCII designers planned it! To convert any uppercase letter to lowercase, a computer just flips one bit."

Quick Detour: Unicode

Unicode is the modern standard with over 150,000 characters. ASCII is a perfect subset of UTF-8 β€” everything we learn today still works! The πŸŽ‰ emoji takes 4 bytes (32 bits).

The Key Insight: $$\text{character} \;\longrightarrow\; \text{number (ASCII)} \;\longrightarrow\; \text{bits} \;\longrightarrow\; \text{black/white squares}$$ QR codes are just the last step made physical.

Part 2 β€” The QR Code Challenge ⭐ (β‰ˆ 20 min)

You're given an empty $21 \times 21$ grid. Your mission: fill in the squares to encode the letter 'A' so a phone camera can read it. Then show what changes for 'B' and 'a'.

Anatomy of a QR Code

A Version 1 QR code is a $21 \times 21$ grid of modules. Not all carry data β€” some are fixed furniture:

RegionPurposeHow to spot it
Finder patterns (Γ—3)"I'm a QR code!" + orientationThree big squares in corners
Timing patternsRuler β€” alternating stripesRow 6 and column 6
Format informationError level + mask patternAdjacent to finders
Data + EC areaYour actual message + error correctionEverything else

πŸ” Interactive: QR Code Anatomy (Version 1)

Hover over regions to identify them. Click a character button to fill in data bits.

The Encoding Recipe β€” Byte Mode

FieldBitsContentFor 'A'
Mode indicator40100 = Byte mode0100
Character count8How many characters00000001
Character data8The ASCII value01000001
Terminator4End marker0000
$$\underbrace{0100}_{\text{mode}} \;\underbrace{00000001}_{\text{count}=1} \;\underbrace{01000001}_{\text{ASCII 65}='A'} \;\underbrace{0000}_{\text{end}}$$

πŸ’‘ Interactive: QR Byte-Mode Bit Stream

The Big Reveal: Swapping Letters

CharByte 1Byte 2Byte 3Bits changed vs. 'A'
A0x400x140x10β€”
B0x400x140x202 (byte 3)
a0x400x160x101 (byte 2)
Dead-End: Life Without Error Correction
A single speck of dirt could flip one module β€” turning 'A' into 'a', 'C', or 'E'. The scanner has no way to tell something went wrong. Without error correction, a QR code is as fragile as writing in pencil during a rainstorm.

β˜• Break (5 min)


Part 3 β€” Error Correction: Mathematics to the Rescue (β‰ˆ 10 min)

The Noisy Channel Problem

ChannelWhat goes wrong
QR codeScratches, dirt, camera blur, bad printing
WiFiElectromagnetic interference
CD/DVDPhysical scratches
Deep spaceCosmic rays (Voyager, 24 billion km away!)

Hamming Distance

The Hamming distance between two bit strings = the number of positions where they differ.

PairHamming distance
A ↔ a1
A ↔ B2
A ↔ C1
A ↔ E1
Fundamental Rule: To detect $t$ errors β†’ minimum distance $\ge t + 1$. To correct $t$ errors β†’ minimum distance $\ge 2t + 1$.

QR Error Correction Levels

LevelRecoveryUse case
L~7%Clean environment
M~15%General use
Q~25%Outdoor / industrial
H~30%Harsh conditions β€” logos OK!

Solution 1 β€” Repetition: Say It Three Times πŸ₯‰

The simplest fix: send every bit three times. Decode by majority vote.

Original:   0  1  0  0  0  0  0  1
Tripled:  000 111 000 000 000 000 000 111   (24 bits for 8!)
$$\text{Rate} = \tfrac{1}{3}, \quad \text{corrects 1 error per triplet}$$ Works, but triples your message size. Surely maths can do better?

Solution 2 β€” Parity Bits: The Cheapest Detector πŸ₯ˆ

Add one extra bit that makes the total number of 1s always even.

For 'A' = 01000001: two 1s (even) β†’ parity bit = 0 β†’ send 01000001 0.

For 'a' = 01100001: three 1s (odd) β†’ parity bit = 1 β†’ send 01100001 1.

If any single bit flips, the total becomes odd β†’ error detected!

Dead-End: Parity can detect 1 error but can't tell you which bit is wrong. And if 2 bits flip simultaneously, parity doesn't notice at all!
$$\text{Rate} = \tfrac{8}{9}, \quad \text{detects 1 error, corrects none}$$

Solution 3 β€” Hamming(7,4): Find AND Fix πŸ₯‡

4 data bits + 3 parity bits = 7 total. Parity bits sit at positions 1, 2, 4 (powers of 2). Each covers positions whose binary representation has a 1 in that column.

πŸ”§ Interactive: Hamming(7,4) Encoder & Error Corrector

Enter 4 data bits. Click any position in the codeword to inject an error.

Worked Example

Protect the first nibble of 'A' = 0100:

$d_1=0, d_2=1, d_3=0, d_4=0$

Parity bits: $p_1 = 0 \oplus 1 \oplus 0 = 1$, $p_2 = 0 \oplus 0 \oplus 0 = 0$, $p_3 = 1 \oplus 0 \oplus 0 = 1$.

Codeword: 1 0 0 1 1 0 0

If noise flips position 5 β†’ received 1 0 0 1 0 0 0.

Syndrome: $s_3 s_2 s_1 = 101_2 = 5$ β†’ error at position 5! Flip it back. βœ“

$$\text{Hamming}(7,4): \;\text{rate} = \tfrac{4}{7} \approx 57\%, \;\text{corrects 1 error per 7 bits}$$

Solution 4 β€” Reed-Solomon: QR's Polynomial Shield πŸŽ“

Hamming codes correct single-bit errors. But a coffee stain on a QR code doesn't flip one lonely bit β€” it obliterates entire bytes in a cluster. QR codes need something far more powerful: Reed-Solomon codes.

The Core Insight: Polynomials as Data

Forget bits for a moment. Think about polynomials.

A polynomial of degree $n$ is uniquely determined by $n + 1$ points. A straight line (degree 1) passes through 2 points. A parabola (degree 2) needs 3. If you have more points than the minimum, you can afford to lose some and still recover the curve.

Analogy: Imagine you want to transmit a straight line $y = 3x + 5$. Only 2 points define it β€” but you send 5 points instead. Even if 1 point gets corrupted, 4 correct points still overdetermine the line. With enough redundancy, you can find and fix the bad point.

Reed-Solomon principle: Treat your data bytes as polynomial coefficients. Evaluate the polynomial at extra points to produce error correction bytes. Send everything. On the receiving end, even if some bytes are damaged, the polynomial can be reconstructed from the surviving points.

Wait β€” Why Not Regular Numbers?

Here's the catch. Ordinary arithmetic breaks polynomial codes:

We need a number system where:

  1. There are exactly 256 elements (one for each byte: 0x00 – 0xFF).
  2. You can add, subtract, multiply, and divide and always stay within those 256 values.
  3. Division by any nonzero element is exact β€” no fractions, no remainders.

Such a system exists. Mathematicians call it a Galois field.


GF(256) β€” The Secret Number System Inside Every QR Code

GF(256) β€” also written $\mathbb{F}_{256}$ or $\text{GF}(2^8)$ β€” is a finite field with exactly 256 elements. "GF" stands for Galois Field, named after Γ‰variste Galois, the French mathematician who invented the theory at age 18 (and tragically died in a duel at 20).

Elements: Bytes as Polynomials

Each element of GF(256) is a byte β€” a value from 0 to 255. But we think of it as a polynomial with binary coefficients of degree ≀ 7:

$$\text{byte } b_7 b_6 b_5 b_4 b_3 b_2 b_1 b_0 \;\longleftrightarrow\; b_7 x^7 + b_6 x^6 + \cdots + b_1 x + b_0$$

For example:

Byte (decimal)BinaryPolynomial
000000000$0$
100000001$1$
200000010$x$
300000011$x + 1$
1900010011$x^4 + x + 1$
25511111111$x^7 + x^6 + x^5 + x^4 + x^3 + x^2 + x + 1$

Addition in GF(256): XOR

Addition is the simplest part β€” it's just bitwise XOR! Since coefficients are binary (0 or 1) and $1 + 1 = 0$ in binary arithmetic (no carrying!), adding two polynomials is pure XOR:

$$19 \oplus 3 = \texttt{00010011} \oplus \texttt{00000011} = \texttt{00010000} = 16$$

In polynomial form: $(x^4 + x + 1) + (x + 1) = x^4$. The $x$ and constant terms cancel!

πŸ’‘ In GF(256), addition = subtraction = XOR. Every element is its own additive inverse: $a + a = 0$ for all $a$.

Multiplication in GF(256): Polynomial Multiplication mod a Special Polynomial

Multiplication is trickier. We multiply the polynomials normally, but then reduce modulo an irreducible polynomial of degree 8:

$$p(x) = x^8 + x^4 + x^3 + x^2 + 1 \quad \text{(binary: \texttt{100011101} = 0x11D)}$$

This polynomial is to GF(256) what a prime number is to modular arithmetic β€” it can't be factored, and dividing by it gives well-behaved remainders.

Example: Multiply $3 \times 7$ in GF(256).

$3 = x + 1$ and $7 = x^2 + x + 1$.

$$ (x + 1)(x^2 + x + 1) = x^3 + x^2 + x + x^2 + x + 1 = x^3 + 1 = 9 $$

(Remember: $x^2 + x^2 = 0$ and $x + x = 0$ β€” they cancel.) The result is 9, already below degree 8, so no reduction needed.

When do we reduce? When the product has degree β‰₯ 8, we divide by $p(x)$ and keep the remainder β€” just like clock arithmetic, but with polynomials. This guarantees the result is always a valid byte (0–255).

πŸ”’ Interactive: GF(256) Calculator

Enter two byte values (0–255). See addition (XOR), multiplication, and division in GF(256).

The Generator Element $\alpha = 2$

GF(256) has a special property: the element $\alpha = 2$ (i.e., $x$) is a generator. If you keep multiplying by 2:

$$\alpha^0 = 1, \quad \alpha^1 = 2, \quad \alpha^2 = 4, \quad \alpha^3 = 8, \quad \ldots$$

The powers cycle through all 255 nonzero elements before returning to 1 at $\alpha^{255} = 1$. This means:

This is how QR decoders multiply efficiently: just look up the exponents in a table, add them, and look up the result. A $256 \times 256$ multiplication table compressed into two 256-entry tables!

⚑ Interactive: Powers of α in GF(256)

Watch how $\alpha = 2$ generates all 255 nonzero elements. The value wraps back to 1 after power 255.

32

Why 256?

This is no coincidence. $256 = 2^8$, and one byte is 8 bits. By working in GF($2^8$), every field element fits in exactly one byte. No overflow, no fractions, no wasted space. This is why Reed-Solomon is so natural for digital data: the math and the hardware speak the same language.

The GF(256) cheat sheet:
β€’ 256 elements = bytes 0–255
β€’ Addition = XOR (fast!)
β€’ Multiplication = polynomial multiplication mod $x^8 + x^4 + x^3 + x^2 + 1$
β€’ Every nonzero element = power of $\alpha = 2$
β€’ Division always exact β€” no fractions

Reed-Solomon Encoding: Step by Step

Now we have our number system. Here's how Reed-Solomon actually protects a QR message.

Step 1: Form the Message Polynomial

Your $k$ data codewords $d_0, d_1, \ldots, d_{k-1}$ become coefficients of a polynomial:

$$m(x) = d_0 \, x^{k-1} + d_1 \, x^{k-2} + \cdots + d_{k-1}$$

For a Version 1-L QR code: $k = 19$ data codewords, so $m(x)$ has degree 18.

Step 2: Build the Generator Polynomial

To produce $t$ error correction codewords, we build a generator polynomial $g(x)$ of degree $t$:

$$g(x) = (x - \alpha^0)(x - \alpha^1)(x - \alpha^2) \cdots (x - \alpha^{t-1})$$

Remember: subtraction = XOR in GF(256), so $x - \alpha^i = x + \alpha^i$ here. For Version 1-L with $t = 7$:

$$g(x) = (x + 1)(x + 2)(x + 4)(x + 8)(x + 16)(x + 32)(x + 64)$$

Multiplying these out (in GF(256) arithmetic) gives a degree-7 polynomial with known coefficients.

Step 3: Polynomial Long Division

Shift $m(x)$ up by multiplying by $x^t$, then divide by $g(x)$:

$$x^t \cdot m(x) = q(x) \cdot g(x) + r(x)$$

The remainder $r(x)$ has degree < $t$ β€” its coefficients are your error correction codewords!

Step 4: Transmit Everything

The full transmitted codeword is:

$$c(x) = x^t \cdot m(x) - r(x)$$

By construction, $g(x)$ divides $c(x)$ evenly. The receiver checks this: if the remainder isn't zero, errors occurred β€” and the math can pinpoint exactly which bytes are wrong.

πŸ›‘οΈ Interactive: Reed-Solomon Encoding (Simplified)

Enter a short message (up to 8 characters). See how data bytes become a message polynomial, and how RS generates error correction bytes over GF(256).

Worked Example: Encoding "A" with 4 EC Bytes

Let's trace a small example with $k = 1$ data byte and $t = 4$ EC bytes.

Data: ASCII 65 = 0x41. Message polynomial: $m(x) = 65$.

Generator: $g(x) = (x + 1)(x + 2)(x + 4)(x + 8)$ β€” a degree-4 polynomial in GF(256).

Shift: $x^4 \cdot m(x) = 65\,x^4$.

Divide $65\,x^4$ by $g(x)$ in GF(256) arithmetic β†’ remainder $r(x) = r_3 x^3 + r_2 x^2 + r_1 x + r_0$.

Those four remainder coefficients are the error correction bytes. The transmitted block is: [65, r₃, rβ‚‚, r₁, rβ‚€].

Decoding: Finding and Fixing Errors

When the receiver gets a (possibly corrupted) message, the Reed-Solomon decoder:

  1. Computes syndromes: Evaluate the received polynomial at $\alpha^0, \alpha^1, \ldots, \alpha^{t-1}$. If all zero β†’ no errors!
  2. Locates errors: Uses the Berlekamp-Massey algorithm (or Euclid's algorithm) to find which byte positions are wrong.
  3. Fixes them: Computes the correct values using Forney's algorithm.
Error correction capacity: With $t$ EC codewords, Reed-Solomon can correct up to $\lfloor t/2 \rfloor$ corrupted codewords, or detect up to $t$ corrupted codewords. If the positions of errors are already known ("erasures"), it can recover up to $t$ of them.

QR Code EC in Practice

Here's what each QR error correction level allocates for Version 1 (21Γ—21, 26 total codewords):

LevelData codewordsEC codewordsCorrectable bytesRecovery
L1973~7%
M16105~15%
Q13136~25%
H9178~30%

At level H, 17 out of 26 codewords are error correction β€” you can obliterate nearly a third of the code and it still scans. This is why companies can slap logos right in the middle of QR codes!

Mind-Bending Fact: Hamming(7,4) corrects one bit per 7-bit block. Reed-Solomon over GF(256) corrects entire bytes β€” and it doesn't matter whether 1 bit or all 8 bits of that byte are wrong. A byte is a byte. This makes RS vastly superior against burst errors, where consecutive bits get damaged (scratches, stains, camera blur).
$$\text{Reed-Solomon over GF}(256): \text{ corrects burst errors at byte level β€” powers every QR code on Earth}$$

Summary

Encoding Chain

$$\text{Char} \xrightarrow{\text{ASCII}} \text{Number} \xrightarrow{\text{Binary}} \text{Bits} \xrightarrow{\text{QR}} \text{Codewords} \xrightarrow{\text{RS}} \text{Protected} \xrightarrow{\text{Grid}} \text{QR Code}$$

Method Comparison

MethodRateCorrectsUsed by
Repetition πŸ₯‰$\frac{1}{3}$1 per triplet(Toy example)
Parity πŸ₯ˆ$\frac{8}{9}$Detects onlyCredit cards, ISBN
Hamming(7,4) πŸ₯‡$\frac{4}{7}$1 per blockECC memory
Reed-Solomon πŸŽ“VariesBurst errorsQR, CD, DVD, Voyager

Extensions & Challenge Problems 🧩

πŸ₯‰ Bronze
  1. Write your full name in ASCII binary. How many bits?
  2. Compute $d(\text{'H'}, \text{'h'})$ and $d(\text{'0'}, \text{'9'})$. Which pair is more vulnerable?
  3. Add an even-parity bit to: 1010101, 1111111, 0000000.
πŸ₯ˆ Silver
  1. Write the full QR byte-mode bit stream for "Hi" (two characters). How many data codewords?
  2. Using 4-bit words, find a set of codewords with minimum Hamming distance 3. What's the maximum size?
  3. Version 1-H uses 9 data codewords. How many ASCII chars fit in byte mode?
πŸ₯‡ Gold
  1. Verify the Hamming bound for Hamming(7,4) with $t=1$: $\frac{2^7}{1+\binom{7}{1}} = 16 = 2^4$. Is it tight?
  2. Extend Hamming(7,4) with an 8th overall parity bit (SECDED). Show it distinguishes 1 vs. 2 errors.
πŸŽ“ Diamond
  1. QR Quine: Can a QR code encode its own pixel data? Version 1 has 441 modules but storing 441 bits requires more capacity than Version 1 provides. What's the smallest self-referential version?
  2. Shannon's promise: For a binary channel with 5% error rate, $C = 1 - H(0.05) \approx 0.71$. What does this mean for QR code design?

Answer Key

Bronze Answers
  1. "Alice" = 5 chars Γ— 8 bits = 40 bits.
  2. $d(\text{'H'},\text{'h'}) = 1$; $d(\text{'0'},\text{'9'}) = 2$. Letters are more vulnerable.
  3. 1010101 0 (4 ones), 1111111 1 (8 ones), 0000000 0 (0 ones).
Silver Answers
  1. Stream: 0100 00000010 01001000 01101001 0000 = 32 bits = 4 codewords.
  2. Hamming bound: $2^4/(1+4) = 3.2$, so at most 3 codewords. Example: {0000, 1110, 0111}.
  3. 9 codewords = 72 bits. Overhead: 4+8+4 = 16 bits. Data: 56 bits β†’ 7 characters.
Gold Answers
  1. $128/8 = 16 = 2^4$. Equality holds β€” Hamming(7,4) is a perfect code!
  2. Syndrome β‰  0 + odd overall parity β†’ 1 error (correct). Syndrome β‰  0 + even overall parity β†’ 2 errors (report).

← Back to all lectures