Kolmogorov Complexity

Kolmogorov complexity of an object or algorithm is the length of its optimal specification. In some sense, it could be thought of as algorithmic entropy, in the sense that it is the amount of information contained in the object.

This is a screenshot from the game .kkrieger.

Given that graphics, the game must be of at least a few hundred megabytes, right? Well, it is just 96 kilobytes.

Informal Introduction

Consider the strings 11111111111111111111111111111111 and 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7.

The most obvious way to describe the first string is to say that it contains 32 1's.

1	`take 32 $ cycle "1"`

The description contains fewer characters than the string itself.

On the other hand, the best way to describe the other string in haskell is (probably) to just site it.

This is why the second string is actually more complex.

The complexity of describing something in a rich language such as English is not unknown to anyone aware of the Berry paradox.

The Berry paradox claims that the complexity of all integers in English is at most eleven. It claims that if were not, then there would be a first integer that is not. But, then again it could be described as The smallest positive integer not definable in under eleven words.

This could be viewed as one of the intuitive reasons for the uncomputablity of Kolmogorov complexity.

If you are interested in code golf, procedural generation or compression theory, your field has got something to do with Kolmogorov complexity.

Definition

Kolmogorov complexity $K$ of a string, relative to a Turing machine $f$ of a string $x,$ is

\[ K_f(x) = \min \{ |p|: f(p) =x \}. \]

It appears that the complexity depends both on $f$ as well as $x$. This is, to some extent, unavoidable in the sense that the description really depends on the description language.

However, if the Turing machine is universal, the differences between the complexity of the string across such a Turing machine would be varying by constants.

This is known as the invariance theorem. The proof follows from the idea of encoding a Turing machine and then the program in that Turing machine. The first part (i.e. the encoded Turing machine) would not depend upon the string and is hence a constant.

Incompressiblity and Randomness

$x$ is incompressible if and only if

\[K(x) \geq x. \]

An incompressible string could also be called algorithmically random.

For all $n$, there exists incompressible strings of length greater than or equal to $n.$

The proof is just a trivial usage of the pigeonhole principle.

There are $2^n$ strings of $n$ bits, but there are $2^n - 1$ descriptions of size less than $n.$ This is because

\[\sum_{i=0}^{n-1} {2^i} = 2^n - 1.\ _\square \]

It is not very hard to see that a majority of the strings are incompressible. However, we will soon see that incompressibility is undecidable, owing to the nature of uncomputability of Kolmogorov complexity.

By this time, you should be able to see why your compression software compresses down some stuff really well, while others do not that well. You should understand why writing a nice compression algorithm is really hard and interesting.

Uncomputability

We need to begin with a definition:

The complexity of a formal language is the complexity of a program that lists all the strings in the language.

$K(\cdot)$ is uncomputable.

No formal system of complexity $n$ can prove that an object has $K(x) > n.$

Suppose not. In that case, the formal system would be able to prove that the object has complexity $n'$. But then this encodes a description of the object in $n$ bits. Since $n' > n $, this is a contradiction.

Since the complexity of any program is finite, it follows that there is a string of higher complexity than the program whose complexity it cannot decide. $_\square$

If this proof does not convince you, examine the following informal proof:

Suppose there is a function kolmogorov(x) whose definition is of $n$ bits.

Now, we could define a program of length $m$ bits like this:

for i = 1 to infinity
   foreach string x of length i
      if kolmogorov(x) > m+n
         print x

This program is, by construction, supposed to output all strings of complexity more than $m+n$. Since the complexity of the program is exactly $m+n$, we have a contradiction since that would really mean that those strings are not of complexity more than $m+n$ because there is a less complex way to output them.

No, he is attempting the impossible. Yes, but it will take exponential time anyway. Yes, but it will take exponential time unless P=NP. No, BF cannot output all computable strings.

Today is Alokananda's birthday.

As a tribute, Agnishom plans to generate a BF program that outputs the string Happy Birthday!

He sees that the following program would be the simplest for him to understand: link.

But then after playing around a bit, he discovers that is too profuse and can be done with a simpler code:

1	`-[------->+<]>-.[--->++++<]>+.-[++>-----<]>..+++++++++.-[---->+<]>++.+[->++<]>.-[--->+<]>--.+++++++++.++.------------.----.---.+[--->+<]>+++.-[---->+<]>+++.`

This inspires him to find the shortest BF program that prints Happy Birthday!.

In general, he is willing to write a (Python) program that outputs the shortest BF program that prints a given string.

Will he succeed?

Assume we are running BF on an infinite cell memory and furthermore, our brains are not super-turing.

If you have been following along, you should see this is a version of Berry paradox, in essence.

The philosophical question of whether humans could decide incompressibility or the halting problem remains. If they were, that would mean that a brain is an object of infinite complexity.

Incompressibility Proof Method

This section needs to be expanded.

This is a generic proof technique that relies on the fact that there exists incompressible strings of any length. This is used to show that a particular property in a class holds by showing that the contrary would imply the non-existence of arbitrary-length random strings.

Prove that there are infinitely many primes.

Contents