So far, we've encountered the idea of information in the context of asking questions to find the truth. But we usually think of information as a written fact like "your grandma likes cookies but she's allergic to walnuts," not as the number of questions we need to ask someone to guess who they're thinking of.
Let's take a turn to this more familiar notion of information by studying the world's languages.
When we hear a person speak in a language that we don't know, it can sometimes seem that they're talking impossibly fast. But without knowing the language, you don't know what's actually being conveyed.
When you perceive that a language is particularly fast, what's the main thing you are picking up on?
One reason we could perceive that a language sounds fast is that its speakers pronounce syllables very quickly.
But we all know people who talk a lot without conveying much information or, as the famous saying goes,
"Your lips are moving, but you're not saying anything."
If we want to know how fast a speaker is giving us information, syllables per second isn't enough. What's the other quantity we need to know?
The syllable is the basic unit of pronunciation, so the speed at which syllables are spoken has a definite impact on the rate of information transmission.
But the other half of the equation is how much information is conveyed in each syllable.
The total amount of information delivered per second is then given by the product of these two factors:
But how can we measure the "information" in a syllable?
In this course, we're going to see that, far from being vague, information is a mathematical quantity that we can put a number on, whether that "information" is in the s and s of a computer or in your family's secret dosa recipe.
Actual family secret dosa recipe- please don't steal.
Let's say you're talking to your mom in Tamil and she's telling you the first step of the dosa recipe, but there's static and so you miss one of the syllables. How much information does that syllable contain?
The amount of information in that syllable depends on what you know already.
For example, say you missed a syllable so that what you hear is
which means "Put the u--du in the water."
The only way to make dosa- duh.
The secret ingredient: words of encouragement
In this second case (where you aren't a good cook), does the missing syllable contain information for you?
Some languages can pack a lot more information into every syllable spoken. That's largely determined by the number of permitted syllable sounds in that language.
For example, English has distinct syllable sounds like the found in "col-lo-qui-al-is-m" or the in "slang."
Japanese, in contrast, has fewer syllable sounds for several reasons, including that certain consonants sounds (like "l" and "v") do not occur in the language. For example, in the loan word (words imported from English to Japanese) スラング (pronounced "su-ran-gu" and from the English "slang"), the "l" sound is replaced with an "r."
The table below shows how widely the number of syllables varies in languages:
There are many more one-syllable words, like "mouse," in English than there are in Japanese. In general, because English has more syllables, it should be possible to express an idea with fewer syllables than the equivalent word in Japanese.
For example, "mouse" is three syllables in Japanese (ねずみ "ne-zu-mi"). To get a sense for this, we can count the number of syllables in the Zodiac animals in English and Japanese and compare the average number of syllables. For this set, English averaged syllables per word while Japanese averaged
If the English word for an animal is "mouse" syllable and the Japanese is "ne-zu-mi" syllables which of these has more information per syllable?
Animal | English syllables | Japanese syllables |
🐭 | ||
🐮 | ||
🐯 | ||
🐰 | ||
🐲 | ||
🐍 | ||
🐴 | ||
🐐 | ||
🐵 | ||
🐔 | ||
🐶 | ||
🐷 | ||
average |
Does this mean that languages like English are faster than others at conveying information?
When it comes down to it, the human brain is a computer of sorts — it processes the information it hears into concepts. Because we all have the same hardware, different languages should experience similar constraints as they develop.
If that's true, then we could make the following hypothesis:
Hypothesis: "Information per second" is actually roughly the same in all languages.
According to this hypothesis, should English have a high or a low "syllable rate" compared to Japanese?
Hint: Remember that our formula for the information rate of a language had two factors:
Suppose we wanted to study this question. We could translate the same paragraph into different languages and record them as spoken by native speakers.
By using the same source material for all the languages, we could ensure that the spoken word conveys the same information.
If our hypothesis is true, then what would you expect to find?
Hypothesis: "Information per second" is actually roughly the same in all languages.
In fact, we did just that.
We used translations of the following paragraph in the native languages of various employees at Brilliant and recorded them speaking it:
"I’ve always found it difficult to sleep on long train journeys. For one thing, I can never make myself comfortable in the seat. Then the other passengers usually talk so loudly, or worse still they snore. In addition, there’s the constant clickety-clack of the wheels on the track. If I do manage to doze off, the ticket inspector comes along and wakes me."
English, 🇦🇺, 19s
Swedish, 🇸🇪, 19s
Mandarin, 🇨🇳, 19s
Korean, 🇰🇷, 19s
Russian, 🇷🇺, 20s
Tamil, 🇱🇰, 21s
Japanese, 🇯🇵, 23s
The passage conveys the same information in each of the languages. That's why, if the amount of information per second is equal in each language, we'd expect people to finish speaking at around the same time.
While a survey of a few people in the office is a nice starting point for testing the hypothesis, it shouldn't be enough evidence to convince us.
Most of the recitations were roughly seconds, but to what extent was that luck?
To get serious, we'd need to make a study with more languages, a larger sample of speakers for each language, and a variety of texts.
Thankfully, that's exactly what this study of languages did, with male and female speakers for each language and different texts (including the one about trains that we just read).
Does the evidence support our hypothesis?
These are the results:
Incredibly, all of these languages seem to have remarkably similar durations, which means that they all convey information at roughly the same rate.
As we'll see in the next chapter, information can be quantified in units called bits. For all of these languages, the information rate clustered at around
That's a wrap for our introduction, but there's a lot more to say. In the rest of the course, we'll discover these:
Nedivi et al., PLoS Biology 2005
The concept of information is so fundamental that it has quietly revolutionized everything from how we decode the genome, to the learning algorithms that underlie our digital lives, to how we understand information processing by the brain.
Let's get started.