Error correcting codes
An error correcting code (ECC) is an encoding scheme that transmits messages as binary numbers, in such a way that the message can be recovered even if some bits are erroneously flipped. They are used in practically all cases of message transmission, especially in data storage where ECCs defend against data corruption.
Contents
Intuitive explanation through an example
Suppose Alice was trying to transmit a message over a noisy telephone to Bob, and wanted to ensure that Bob correctly received the message. To ensure this, Alice has several options, the most obvious being for Bob to repeat the message for Alice to confirm. Unfortunately, this is not always possible (as in one-way communication links) or may be costly if, for example, Alice is distributing the same message to several people simultaneously. Another possible option is for Alice to repeat the message multiple times, but this is fairly inefficient.
Alternatively, Alice could choose to use words that are not easily confused with others, so that even if Bob slightly mishears a word, Bob can figure out the intention from context. This is the concept behind, for instance, the NATO phonetic alphabet: where the alphabet is represented by "Alpha", "Bravo", "Charlie", and so on. These words are specifically chosen to be dissimilar from another, so that, for instance, hearing "vo" is enough to guess the intended letter was B. Note that this is true of practically any pair of words below: almost no pair shares a similar sound, so a single syllable is often good enough to figure out the intended letter.
Letter | Code | Letter | Code |
A | Alpha | N | November |
B | Bravo | O | Oscar |
C | Charlie | P | Papa |
D | Delta | Q | Quebec |
E | Echo | R | Romeo |
F | Foxtrot | S | Sierra |
G | Golf | T | Tango |
H | Hotel | U | Uniform |
I | India | V | Victor |
J | Juliet | W | Whiskey |
K | Kilo | X | X-ray |
L | Lima | Y | Yankee |
M | Mike | Z | Zulu |
Error Correcting Codes operate in the same way as Alice does, but with binary numbers rather than words. Each letter (or, more generally, every one of a possible set of message) is encoded as a binary number, and bits are transmitted in succession. In this sense, the "noisy telephone" is equivalent to there being some probability that a bit is incorrectly transmitted (i.e. it is transmitted as a 1 despite actually being a 0, or vice versa). The goal of an ECC is to transmit a message that can be accurately interpreted even if some bits are accidentally flipped.
Repetition codes
The simplest example of an ECC is analogous to Alice repeating the message: when a bit is to be transmitted, it is instead transmitted multiple times (usually 3), with the bit transmitted more times being interpreted as the intended message. More specifically:
Received Message | Interpreted bit |
000 | 0 |
001 | 0 |
010 | 0 |
100 | 0 |
011 | 1 |
101 | 1 |
110 | 1 |
111 | 1 |
For example, a message of "10101" might be transmitted as 111010111000110, where the bolded bits have been accidentally flipped. This would be interpreted correctly, as
Received Message | Interpreted bit |
111 | 1 |
010 | 0 |
111 | 1 |
000 | 0 |
110 | 1 |
will be interpreted as "10101", the intended message. Thus this method of transmission results in the correct message despite two bits being incorrectly flipped. More specifically, if the probability of a flipped bit is \(p\), the probability of an incorrectly transmitted bit is \(p^3+3p^2(1-p)\), which is significantly less than the probability of \(p\) that would result from just transmitting the bit once as usual (since \(p\) is very small).
A (very) noisy channel is known to flip bits 20% of the time. Alice decides to send the same bit multiple times, and Bob will interpret the message as the bit he receives more times (e.g. if Bob received 0, 1, 1, he would conclude the message was 1).
If Alice wants to ensure the message is correctly received with probability at least 95%, what is the minimum number of times Alice needs to send each bit?
This code is called the (3,1) repetition code, meaning that three bits are sent, one of which is the desired data bit. It can correct a message when one of the transmitted bits is flipped - a single bit error - meaning that the correcting ability of this code is 1 bit. It can also detect an error occurred if at most two bits are flipped, meaning that the detecting ability of this code is 2 bits.
However, this encoding scheme is fairly inefficient, as it requires the sender to transmit three times the message's length in order to achieve single bit correction. More formally, the code rate of this code is \(\frac{1}{3}\) - the number of data bits divided by the number of transmitted bits.
An even simpler example of a repetition code is the parity code, in which each bit is transmitted twice:
Message | Encoding |
0 | 00 |
1 | 11 |
It can detect, but not correct, single-bit errors. The importance of this code lies in the concept of a parity bit, which is a bit added to make the number of 1s in each encoding even. By doing so, any message with an odd number of 1s can immediately be recognized as erroneous. More generally, parity bits can be added to make the number of 1s in specific positions even, also making any message that has an odd number of ones in those positions immediately recognizable as erroneous.
Hamming codes
A more efficient encoding scheme is a Hamming code, which is analogous to the phonetic alphabet from the opening section. In a Hamming code, every possible message string is encoded as a certain binary number, with the set of numbers specifically chosen so that they are all significantly different in some sense; in other words, every pair of encoded messages are substantially different by some measure.
That measure is Hamming distance. The Hamming distance between 2 numbers is the number of bits they differ at. For example, 1101010 and 1111000 are a hamming distance of 2 apart:
1101010
1111000
The key here is that if any pair of encodings are sufficiently far apart in terms of Hamming distance, errors can be detected and corrected by seeing which of the codewords is closest to the transmitted message. For example, consider the encoding
Letter | Encoding |
A | 000 |
B | 011 |
C | 101 |
D | 110 |
In this encoding, the minimum Hamming distance between encodings is 2, which means that single-bit errors can be detected -- i.e. if a single bit is flipped during transmission of a letter, it can be determined that an error was made. It cannot, however, be determined what the original message was; for example, a transmitted message of "010" could have been a single-bit error resulting from sending an "A", "B", or "D".
In this encoding, however:
Letter | Encoding |
A | 000 |
B | 111 |
the minimum Hamming distance between encodings is 3, which means that single-bit errors can be corrected, and double-bit errors detected. This is the (3,1) code from the previous section.
Generally speaking, an encoding can detect \(k\)-bit errors if the minimum Hamming distance is at least \(k+1\), and correct \(k\)-bit errors if the minimum Hamming distance is at least \(2k+1\).
Hamming codes take this idea, combined with the idea of parity bits, and allow the parity bits to overlap. More specifically, they follow this algorithm:
- When transmitting a binary number, use the 1st, 2nd, 4th, 8th, etc. (all powers of two) bits as parity bits, and all other positions as data bits.
- The parity bit at \(2^n\) is the sum of all bits (taken modulo 2) whose positions have the \(n\)th least significant bit set to 1. For example, the parity bit at 2 is the sum of the bits at positions \(3=11_2, 6=110_2, 7=111_2, 10=1010_2, 11=1011_2, \ldots\), since these positions all have the 2nd rightmost bit set to 1.
- Transmit this entire message. If an error in a data bit occurred, some of the parity bits will show an error (since there would be an odd number of 1s in some of these sets). Their sum is the location of the error. If only one parity bit shows an error, that parity bit was in fact in error.
The key point is that every bit is encoded in a unique set of parity bits: specifically, the ones for which the binary representation of the bit's position is a 1. For example, the 14th bit is included in the parity bits at 8, 4, and 2, since \(14=1110_2\). If parity bits 8, 4, and 2 show an error, then the 14th bit is in error.
Generally speaking, with \(n\) parity bits, \((2^n-1)-n=2^n-n-1\) bits are usable for data, meaning that up to \(2^{2^n-n-1}\) different members of an alphabet can be encoding using just \(n\) parity bits. This is a vast improvement on repetition codes when \(n>2\); for example, an encoding in which every word needs 4 bits of information (thus up to 16 codewords can be encoded) can be transmitted with 3 parity bits for a total of 7 bits, rather than \(4 \cdot 3=12\) bits from the repetition scheme in the previous section. Since it can correct single-bit errors and detect double-bit errors, this makes Hamming codes far more efficient than repetition codes while achieving the same purpose.
Constructing Hamming codes
Hamming codes can be constructed by hand, but they are much easier to construct using linear algebra. Consider a matrix \(M\) whose columns are all the possible nonzero vectors of parity bits. For example, with 3 parity bits, the matrix is
\[M=\begin{pmatrix}1&1&0&1&1&0&0\\1&0&1&1&0&1&0\\0&1&1&1&0&0&1\end{pmatrix}\]
The order of the columns doesn't matter, but it is convenient to write \(M\) in this form (with the rightmost 3 columns the 3-identity matrix) as the true goal is to find generator matrix \(G\), which satisfies \(MG^T=0\). This can be formed by taking the transpose of the left hand side of \(M\) (the part distinct from the identity matrix), and adding the appropriate identity matrix to the left:
\[G=\begin{pmatrix}1&0&0&0&1&1&0\\0&1&0&0&1&0&1\\0&0&1&0&0&1&1\\0&0&0&1&1&1&1\end{pmatrix}\]
It can easily be checked that this algorithm results in \(MG^T=0\) as desired. A Hamming code then takes a bit-vector \(\overrightarrow{x}\) to \(\overrightarrow{x}G\); e.g.
\[\begin{pmatrix}1&0&1&1\end{pmatrix} \rightarrow \begin{pmatrix}1&0&1&1\end{pmatrix}\begin{pmatrix}1&0&0&0&1&1&0\\0&1&0&0&1&0&1\\0&0&1&0&0&1&1\\0&0&0&1&1&1&1\end{pmatrix}=\begin{pmatrix}1&0&1&1&0&1&0\end{pmatrix}\]
so "1011" is encoded as "1011010".
This results in the Hamming[7,4] code, the original and most famous of the Hamming codes.