# Using Statistics to Break Codes

Here is a mysterious message:

J MPWF QZUIPO

Can you decode it? What does it say?

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

The "letter-shift rule" from the previous problem is an instance of one of the very first secret ciphers used in history: the Caesar cipher. (Julius Caesar himself used it in his private notes.) It replaces each letter by another letter a fixed number of places further in the alphabet. For example, in a Caesar cipher with shift 2, A becomes C, B becomes D, C becomes E, and so on. The last letters Y and Z "wrap around", becoming A and B, respectively:

If in a given Caesar cipher, A gets encoded as M, as what letter does P get encoded?

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

One very important application of programming is the analysis and visualization of data. These can be experimental measurements, survey results, images, video, network traffic, …

In recent years, Python has become incredibly popular among those who deal professionally with large amounts of data, because it is a simple yet enormously powerful language. It gains this power from thousands of freely available packages of commands, known as modules. You have used one such module, called turtle, already for drawing. Now you get a peek at a more advanced drawing module, called matplotlib, that is professionally used for data visualization.

In this quiz, you will visualize the distribution of letters in a coded message in order to decipher it.

Distribution of letters in a typical English text: E is the most frequently used letter.

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

The following program implements a Caesar shift of 2 and applies it to the first few lines of The Raven by E. A. Poe. What letter appears most often in the coded text?

Hint: What is the most frequent letter in a typical text in plain English?

alphabet = "abcdefghijklmnopqrstuvwxyz"

# convert between letters and numbers up to 26
def number_to_letter(i):
return alphabet[i%26] # %26 does the wrap-around

def letter_to_number(l):
return alphabet.find(l) # index in the alphabet

# How to encode a single character (letter or not)
def caesar_shift_single_character(l, amount):
i = letter_to_number(l)
return "" # remove it, it's space or punctuation
else:
return number_to_letter(i + amount) # Caesar shift

# How to encode a full text
def caesar_shift(text, amount):
shifted_text = ""
for char in text.lower(): # also convert uppercase letters to lowercase
shifted_text += caesar_shift_single_character(char, amount)
return shifted_text

### MAIN PROGRAM ###

message = """
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door—
"'Tis some visitor," I muttered, "tapping at my chamber door—
Only this and nothing more."
"""

code = caesar_shift(message, 2)
print(code)
Python 3
You need to be connected to run code

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

Below you see two strings of letters. Both seem random, but one of them is a meaningful text encoded with a Caesar cipher. One way of telling coded messages apart from random noise is to look at the letter frequencies: if a few letters appear significantly more often than the rest, as is usually the case in written language, then the text is most likely not randomly generated.

Text 1:

 1 2 3 4 5 6 7 8 yfdpcpoplhhwdpssbjnsqvtlcpzpxqugtjphvgotuvwxufgoqigxwgkskduooyeuoue fjlnmsqpgxrmcseeliswdheywseqgcbeothskxdzekgxmmkildjnaqbukprpfaaknsu qpdwayqaqfxsoapvsgreqydqjnkpjghvrkygtidzibhrqkmocukhcunpjcazzvomtsc fgycwfltmiegaejwcqrgsnxxcbtcrckufwsdxdhbxgppxcuzapbdhftzmugryfseavv bssqlxanvmfwwzityziixasivzkmvtfczqmdgkabcnjbyhaoealengfptuedlmvryeb titbwqkekzdpmbtiphdkwwiduassvbgalxgrfhrjrjplxpujrprqzcpcdqsjorigazt kwwlnwbjryrzhgcttroyemuwwixwufymnknirzmexyowobvardlqktzajzoijwulomg ztefdpftjealzapcgipgaaspuzxklvd 

Text 2:

 1 2 3 4 5 6 7 8 swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn ccdbodmrpkbkgki 

To help you decide which text is which, here is a program that can show how often each letter appears as a bar graph. Copy each text into the indicated line and run the program to see it.

Which text contains a secret message?

import matplotlib.pyplot as plt
alphabet = "abcdefghijklmnopqrstuvwxyz"

code = """
paste the text here
"""

letter_counts = [code.count(l) for l in alphabet]
letter_colors = plt.cm.hsv([0.8*i/max(letter_counts) for i in letter_counts])

plt.bar(range(26), letter_counts, color=letter_colors)
plt.xticks(range(26), alphabet) # letter labels on x-axis
plt.tick_params(axis="x", bottom=False) # no ticks, only labels on x-axis
plt.title("Frequency of each letter")
plt.savefig("output.png")

Copy and paste either text here to replace this placeholder text. (Make sure not to remove the triple quotes.)

Python 3
You need to be connected to run code

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

If we know that a particular text has been encoded using a Caesar cipher, there still remains the task of finding out the amount by which the letters have been shifted. One method is to look at the most frequent letter in the text: it probably stands for the most frequent letter in the plain message, which in English is usually E. From that you can deduce the shift and decode the message.

What needs to be the case for this method to work with reasonable chance of success?

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

Here is the Caesar-shifted secret message from before. Judging from its letter distribution, by how many places has it been shifted?

 1 2 3 4 5 6 7 8 swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn ccdbodmrpkbkgki 

import matplotlib.pyplot as plt
alphabet = "abcdefghijklmnopqrstuvwxyz"

code = """
paste the text here
"""

letter_counts = [code.count(l) for l in alphabet]
letter_colors = plt.cm.hsv([0.8*i/max(letter_counts) for i in letter_counts])

plt.bar(range(26), letter_counts, color=letter_colors)
plt.xticks(range(26), alphabet) # letter labels on x-axis
plt.tick_params(axis="x", bottom=False) # no ticks, only labels on x-axis
plt.title("Frequency of each letter")
plt.savefig("output.png")

Copy and paste either text here to replace this placeholder text. (Make sure not to remove the triple quotes.)

Python 3
You need to be connected to run code

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

Now it is time to crack the code! From the shift determined in the previous question (10), you can decode the message by applying the reverse shift (i. e. by –10). Below is the relevant program from the start of the quiz. Edit it accordingly.

 1 2 3 4 5 6 7 8 swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn ccdbodmrpkbkgki 

alphabet = "abcdefghijklmnopqrstuvwxyz"

# convert between letters and numbers up to 26
def number_to_letter(i):
return alphabet[i%26] # %26 does the wrap-around

def letter_to_number(l):
return alphabet.find(l) # index in the alphabet

# How to encode a single character (letter or not)
def caesar_shift_single_character(l, amount):
i = letter_to_number(l)
return "" # remove it, it's space or punctuation
else:
return number_to_letter(i + amount) # Caesar shift

# How to encode a full text
def caesar_shift(text, amount):
shifted_text = ""
for char in text.lower(): # also convert uppercase letters to lowercase
shifted_text += caesar_shift_single_character(char, amount)
return shifted_text

### MAIN PROGRAM ###

message = """
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door—
"'Tis some visitor," I muttered, "tapping at my chamber door—
Only this and nothing more."
"""

code = caesar_shift(message, 3)
print(code)

Adjust the amount of the Caesar shift here.

Python 3
You need to be connected to run code

## Using Statistics to Break Codes

### Introduction

# Using Statistics to Break Codes

Code-breaking is just one application of handling (text) data and visualizing it. Later chapters dive deeper into a plethora of data analysis techniques, as well as other aspects of data science such as randomness and simulations.

## Using Statistics to Break Codes

### Introduction

×