Y cn rd ths jst fn

If you could read the title just fine, it is because the English language (as well as all natural languages) are redundant. This isn't to say that there are multiple words that mean the same thing (although there are), but that if you compare the number of questions you would have to ask to uniquely identify a word I'm thinking of, and the number of possible words, the second number is much bigger than the first.

For example, by the time you read the first four letters of a word starting with calc, you can be pretty sure it's going to end up being calculus, calcium, calculate, or calculation. You don't need all the extra letters to distinguish the remaining possibilities. Concretely, if we take a list of all the words of a given length that exist, and sort them into alphabetical order, we only need \(\log_2 N\) questions to identify any given word in the list (where \(N\) is the number of words in the list). On the other hand, if we were making full use of the language, we could manage \(\displaystyle 26^L\) unique words of length \(L\).

Using the English language dictionary built in to UNIX operating systems, and filtering for words of length 5, I find 10230 unique words. Taking words of length 5 as a proxy for the entire English language, how short, on average, could we make five letter words before someone with perfect reasoning couldn't read them anymore?

×

Problem Loading...

Note Loading...

Set Loading...