### Knowledge and Uncertainty

So far, we've encountered the idea of information in the context of asking questions to find the truth. But we usually think of information as a written fact like "your grandma likes cookies but she's allergic to walnuts," not as the number of questions we need to ask someone to guess who they're thinking of.

Let's take a turn to this more familiar notion of information by studying the world's languages.

# What's the Fastest Language?

When we hear a person speak in a language that we don't know, it can sometimes seem that they're talking impossibly fast. But without knowing the language, you don't know what's actually being conveyed.

When you perceive that a language is particularly fast, what's the main thing you are picking up on?

# What's the Fastest Language?

One reason we could perceive that a language sounds fast is that its speakers pronounce syllables very quickly.

But we all know people who talk a lot without conveying much information or, as the famous saying goes,

"Your lips are moving, but you're not saying anything."

If we want to know how fast a speaker is giving us information, syllables per second isn't enough. What's the other quantity we need to know?

# What's the Fastest Language?

The syllable is the basic unit of pronunciation, so the speed at which syllables are spoken has a definite impact on the rate of information transmission.

But the other half of the equation is how much information is conveyed in each syllable.

The total amount of information delivered per second is then given by the product of these two factors:

\begin{aligned} &{\color{#69047E}\text{(Information per second)}\color{#333333}}\\ &={\color{#D61F06}\text{(Syllables per second)}} \color{#333333}\times {\color{#3D99F6}\text{(Information per syllable)}}. \end{aligned}

But how can we measure the "information" in a syllable?

# What's the Fastest Language?

In this course, we're going to see that, far from being vague, information is a mathematical quantity that we can put a number on, whether that "information" is in the $0$s and $1$s of a computer or in your family's secret dosa recipe.

Actual family secret dosa recipe- please don't steal.

Let's say you're talking to your mom in Tamil and she's telling you the first step of the dosa recipe, but there's static and so you miss one of the syllables. How much information does that syllable contain?

# What's the Fastest Language?

The amount of information in that syllable depends on what you know already.

For example, say you missed a syllable so that what you hear is

$\text{உ-}\color{#D61F06}{?}\color{#333333}\text{-து\ \ தண்ணீரில்\ \ போடு''},$

which means "Put the u-$\color{#D61F06}{?}$-du in the water."

• If you're familiar with dosa, you know that the only acceptable ingredient to use in the recipe is u-lun-du. Therefore, when you hear "u-$\color{#D61F06}{?}$-du," you know what she meant. In this case the missing syllable contains no information at all.

The only way to make dosa- duh.

• But if your dosa knowledge isn't great, it won't be obvious what the missing syllable supposed to be. You might wonder if your mom said "put the U-ru-du in the water." (Urudu, or Urdu, is an official language of Pakistan.)

The secret ingredient: words of encouragement

In this second case (where you aren't a good cook), does the missing syllable contain information for you?

# What's the Fastest Language?

Some languages can pack a lot more information into every syllable spoken. That's largely determined by the number of permitted syllable sounds in that language.

For example, English has $6949$ distinct syllable sounds like the $6$ found in "col-lo-qui-al-is-m" or the $1$ in "slang."

Japanese, in contrast, has fewer syllable sounds $(643)$ for several reasons, including that certain consonants sounds (like "l" and "v") do not occur in the language. For example, in the loan word (words imported from English to Japanese) スラング (pronounced "su-ran-gu" and from the English "slang"), the "l" sound is replaced with an "r."

The table below shows how widely the number of syllables varies in $17$ languages:

$\begin{array}{lr} \textbf{Language} & \textbf{Syllables} \\[1em] \text{English (UK)} & 6949 \\ \text{German} & 5100 \\ \text{Hungarian} & 4325 \\ \text{Finnish} & 3844 \\ \text{Serbian} & 3831 \\ \text{Catalan} & 3600 \\ \text{Turkish} & 3260 \\ \text{French} & 2949 \\ \text{Spanish} & 2778 \\ \text{Vietnamese} & 2776 \\ \text{Italian} & 2729 \\ \text{Thai} & 2428 \\ \text{Basque} & 2082 \\ \text{Cantonese} & 1298 \\ \text{Mandarin Chinese} & 1274 \\ \text{Korean} & 1104 \\ \text{Japanese} & 643 \end{array}$

# What's the Fastest Language?

There are many more one-syllable words, like "mouse," in English than there are in Japanese. In general, because English has more syllables, it should be possible to express an idea with fewer syllables than the equivalent word in Japanese.

For example, "mouse" is three syllables in Japanese (ねずみ "ne-zu-mi"). To get a sense for this, we can count the number of syllables in the $12$ Zodiac animals in English and Japanese and compare the average number of syllables. For this set, English averaged $\sim 1.4$ syllables per word while Japanese averaged $\sim 2.4.$

If the English word for an animal is "mouse" $(1$ syllable$)$ and the Japanese is "ne-zu-mi" $(3$ syllables$),$ which of these has more information per syllable?

 Animal English syllables Japanese syllables 🐭 $1$ $3$ 🐮 $1$ $2$ 🐯 $2$ $2$ 🐰 $2$ $3$ 🐲 $2$ $2$ 🐍 $1$ $2$ 🐴 $1$ $2$ 🐐 $1$ $3$ 🐵 $2$ $2$ 🐔 $2$ $2$ 🐶 $1$ $2$ 🐷 $1$ $4$ average $\sim 1.4$ $\sim 2.4$

# What's the Fastest Language?

Does this mean that languages like English are faster than others at conveying information?

When it comes down to it, the human brain is a computer of sorts — it processes the information it hears into concepts. Because we all have the same hardware, different languages should experience similar constraints as they develop.

If that's true, then we could make the following hypothesis:

Hypothesis: "Information per second" is actually roughly the same in all languages.

According to this hypothesis, should English have a high or a low "syllable rate" compared to Japanese?

Hint: Remember that our formula for the information rate of a language had two factors:

\begin{aligned} &{\color{#69047E}\text{(Information per second)}\color{#333333}}\\ &={\color{#D61F06}\text{(Syllables per second)}} \color{#333333}\times {\color{#3D99F6}\text{(Information per syllable)}}. \end{aligned}

# What's the Fastest Language?

Suppose we wanted to study this question. We could translate the same paragraph into different languages and record them as spoken by native speakers.

By using the same source material for all the languages, we could ensure that the spoken word conveys the same information.

If our hypothesis is true, then what would you expect to find?

Hypothesis: "Information per second" is actually roughly the same in all languages.

# What's the Fastest Language?

In fact, we did just that.

We used translations of the following paragraph in the native languages of various employees at Brilliant and recorded them speaking it:

"I’ve always found it difficult to sleep on long train journeys. For one thing, I can never make myself comfortable in the seat. Then the other passengers usually talk so loudly, or worse still they snore. In addition, there’s the constant clickety-clack of the wheels on the track. If I do manage to doze off, the ticket inspector comes along and wakes me."

English, 🇦🇺, 19s

Swedish, 🇸🇪, 19s

Mandarin, 🇨🇳, 19s

Korean, 🇰🇷, 19s

Russian, 🇷🇺, 20s

Tamil, 🇱🇰, 21s

Japanese, 🇯🇵, 23s

The passage conveys the same information in each of the languages. That's why, if the amount of information per second is equal in each language, we'd expect people to finish speaking at around the same time.

# What's the Fastest Language?

While a survey of a few people in the office is a nice starting point for testing the hypothesis, it shouldn't be enough evidence to convince us.

Most of the recitations were roughly $20$ seconds, but to what extent was that luck?

To get serious, we'd need to make a study with more languages, a larger sample of speakers for each language, and a variety of texts.

Thankfully, that's exactly what this study of $17$ languages did, with $5$ male and $5$ female speakers for each language and $15$ different texts (including the one about trains that we just read).

Does the evidence support our hypothesis?

# What's the Fastest Language?

These are the results:

Incredibly, all of these languages seem to have remarkably similar durations, which means that they all convey information at roughly the same rate.

As we'll see in the next chapter, information can be quantified in units called bits. For all of these languages, the information rate clustered at around $39 \pm 5\,\text{bits per second}.$

# What's the Fastest Language?

That's a wrap for our introduction, but there's a lot more to say. In the rest of the course, we'll discover these:

• the quantitative measure of information $($Chapter 2$),$
• how to update our beliefs in light of new information $($Chapter 3$),$ and
• how to use information and Bayesian reasoning to identify cause-effect relationships $($Chapter 4$).$

Nedivi et al., PLoS Biology 2005

The concept of information is so fundamental that it has quietly revolutionized everything from how we decode the genome, to the learning algorithms that underlie our digital lives, to how we understand information processing by the brain.

Let's get started.

# What's the Fastest Language?

×