Primality Testing

Prime numbers are of immense importance in cryptography, computational number theory, information science and computer science. There are several algorithms to test if a number is prime. Some of them are fast, but no fast algorithm to factorize a number is known.

A primality test is deterministic if it outputs True when the number is a prime and False when the input is composite with probability 1. Otherwise, the primality test is probabilistic. A probabilistic primality test is often called a pseudoprimality test.

In the following, $n$ is assumed to be a positive integer greater than $1$ .

Trial Division

This is one of the simplest deterministic primality tests, by naively checking the condition of a number being prime. It uses the fact that a prime number is not divisible by any other positive integer other than itself and $1$ , so we can turn it into its contrapositive: if a number is divisible by some other positive integer besides itself and $1$ , it is composite.

def isPrime(n):
    if n < 2: return False
    for i in xrange(2, n):   # from 2 to n-1
        if n % i == 0:       # n is divisible by i
            return False
    return True

However, it is not necessary to check all numbers from $2$ to $n-1$ .

Suppose that $n$ is composite, so $n=pq$ for $2 \le p,q \le n-1$ . We claim that at least one of $p,q$ is not greater than $\sqrt{n}$ . Indeed, if both are greater than $\sqrt{n}$ , then $pq > \sqrt{n} \cdot \sqrt{n} = n$ , a contradiction. Thus whenever $n$ is composite, one of its factors is not greater than $\sqrt{n}$ , so we can modify the range endpoint above:

from math import sqrt
def isPrime(n):
    if n < 2: return False
    for i in range(2, int(sqrt(n)) + 1):   # from 2 to sqrt(n)
        if n % i == 0:                     # n is divisible by i
            return False
    return True

We loop through $i$ for $\sqrt{n}$ times, so the time complexity is $O(\sqrt{n})$ , multiplied the time complexity of division (about as fast as multiplication at around $O(\lg n \lg \lg n)$ , using Newton-Raphson division algorithm). This gives poor performance for large values of $n$ .

Wilson's Theorem

Wilson's theorem is an interesting number theoretic result that can be turned into a primality test.

A positive integer $n>1$ is prime if and only if $(n-1)! \equiv -1 \pmod{n}$ . $_\square$

Thus we can simply compute $(n-1)! \mod n$ to check whether $n$ is prime.

def isPrime(n):
    fac = 1
    for i in range(1, n):   # from 1 to n-1
        fac = (fac * i) % n
    return (fac == n-1)     # true if (n-1)! mod n = n-1, that is (n-1)! = -1 mod n

However, this is absurdly slow; this takes $O(n)$ multiplications, even slower than trial division above. (This is comparable to the unmodified trial division, checking up to $n-1$ instead of $\sqrt{n}$ .)

Fermat Primality Test

This primality test uses the idea of Fermat's little theorem.

Let $p$ be a prime number and $a$ be an integer not divisible by $p$ . Then $a^{p-1}-1$ is always divisible by $p$ , or $a^{p-1} \equiv 1 \pmod{p}$ . $_\square$

The idea of Fermat primality test is to use the contrapositive: if for some $a$ not divisible by $n$ we have $a^{n-1} \not\equiv 1 \pmod{n}$ , then $n$ is definitely composite.

However, it's not true that if $n$ is composite, then any $a$ works! For example, consider $n = 15, a = 4$ . We can compute that $4^{15-1} = 4^{14} \equiv 1 \pmod{15}$ , so if we have $n = 15$ but we pick $a = 4$ , we cannot conclude that $n$ is composite. So we need to choose $a$ that allows the test to pass, marking $n$ as composite.

But how do we choose such $a$ ? The solution is simple: make it probabilistic; that is, we pick $a$ randomly. To give high confidence, we repeat the test several times, say $k$ times.

import random
def isPrime(n, k):
    for i in range(k):
        a = random.randrange(2, n)  # 2 <= a <= n-1
        if pow(a, n-1, n) != 1:     # compute a^(n-1) mod n
            return False            # definitely composite
    else:
        return True                 # probably prime

If $a^{n-1} \equiv 1 \pmod{n}$ but $n$ is composite, then $a$ is called a Fermat liar for $n$ , and $n$ is called a Fermat pseudoprime to base $a$ . Otherwise, $a$ is called a Fermat witness for $n$ .

To make things worse, there are numbers which are Fermat pseudoprime to all bases! That is, $a^{n-1} \equiv 1 \pmod{n}$ for all $a$ relatively prime to $n$ , but $n$ itself is composite. These numbers are called Carmichael numbers, the first of which is $561$ .

The good thing is that Carmichael numbers are pretty rare. The bad thing is that there are infinitely many of them. This makes this test rarely used, as it's not feasible to simply store all Carmichael numbers (since there are infinitely many of them). Another good thing is that if $n$ is not a Carmichael number, then at least half of the integers in the range $[1,n-1]$ are Fermat witnesses, so if we can tell that $n$ is not a Carmichael number, every iteration of the test passed halves the probability that $n$ is composite. The probability of a composite number is mistakenly called prime for $k$ iterations is $2^{-k}$ .

The running time of the above algorithm is $O(k)$ for $k$ tests, multiplied by $O(\lg n)$ for modular exponentiation, multiplied by some time for multiplication (schoolbook multiplication uses $O(\lg^2 n)$ time). Although this test has flaws (being probabilistic, and Carmichael numbers), it runs in polylogarithmic time, which is fast.

Miller-Rabin Primality Test

Miller-Rabin primality test, in some sense, is a more advanced form of Fermat primality test. Here is the algorithm first, with explanation to follow.

import random
def isPrime(n, k):
    if n < 2: return False
    if n < 4: return True
    if n % 2 == 0: return False    # speedup

    # now n is odd > 3
    s = 0
    d = n-1
    while d % 2 == 0:
        s += 1
        d //= 2
    # n = 2^s * d where d is odd

    for i in range(k):
        a = random.randrange(2, n-1)    # 2 <= a <= n-2
        x = (a**d) % n
        if x == 1: continue
        for j in range(s):
            if x == n-1: break
            x = (x * x) % n
        else:
            return False
    return True

To understand the algorithm, we need a couple of concepts.

First, suppose $p$ is prime, and consider the modular equation $x^2 \equiv 1 \pmod{p}$ . What are the solutions to this equation? We know that $x^2 - 1 \equiv 0 \pmod{p}$ , or $(x-1)(x+1) \equiv 0 \pmod{p}$ . Since $p$ is prime, $p$ has to divide either $x-1$ or $x+1$ ; this leads to $x \equiv 1, -1 \pmod{p}$ .

Now, suppose $p$ is an odd prime and $a$ is an integer relatively prime to $p$ . By Fermat's little theorem, we know that $a^{p-1} \equiv 1 \pmod{p}$ . The idea is that we repeatedly divide the exponent by two: as long as the exponent is even, $a^{2e} \equiv 1 \pmod{p}$ for some $e$ , we can invoke the above result, giving $a^e \equiv -1 \pmod{p}$ or $a^e \equiv 1 \pmod{p}$ . In the latter case, we can invoke the result again, until either we get the first case or the exponent is odd.

In other words, we have the following theorem:

Let $p$ be an odd prime, and let $p-1 = 2^s \cdot d,$ where $d$ is an odd integer and $s$ is a positive integer. Also let $a$ be a positive integer coprime to $p$ . Then at least one of the following must hold:

Some of $\large a^{2^s \cdot d}, a^{2^{s-1} \cdot d}, a^{2^{s-2} \cdot d}, \ldots, a^d$ is congruent to $-1 \pmod{p}$ .

$a^d \equiv 1 \pmod{p}$ .

Miller-Rabin primality test uses the contrapositive of this theorem. That is, if for some $a$ , neither of the above holds, then $p$ is clearly not prime. This explains the bulk of the algorithm:

        x = (a**d) % n
        if x == 1: continue
        for j in range(s):
            if x == n-1: break
            x = (x**2) % n
        else:
            return False

Here, we compute $x = a^d \mod n$ . (In the algorithm, we name the integer as $n$ instead of $p$ as we don't know if it's a prime yet.) If $x \equiv 1 \pmod{n}$ , then the second point in the theorem is true. Otherwise, we test if $x \equiv -1 \pmod{n}$ , which means the first point is true, or otherwise we square $x$ , giving $a^{2^1 \cdot d}$ . We repeat this again, checking if it is congruent to $-1 \pmod{n}$ ; if it doesn't, we square it to $a^{2^2 \cdot d}$ , and so on until $a^{2^{s-1} \cdot d}$ . Since we know $a^{2^s \cdot d} = a^{n-1} \equiv 1 \pmod{n}$ if $n$ is prime, there is no need to check one more time. If we ever encountered the if x == n-1: break line, then the first point is true; otherwise, neither point is true and thus $n$ is certainly composite. Note that we're using the for-else construct here; if the for exits because of break, the else clause is not run, otherwise it is run. When the break is not encountered, we know that neither point is true, so $n$ is certainly composite and thus we can go to the else clause and immediately return False.

Now, the problem is on finding the value of $a$ . How do we select it? There is no good way to do so with certainty, which means we simply select $a$ at random and repeat it for several, $k$ , times, like in Fermat primality test. Just like in Fermat primality test, there are a couple of terms: if for a composite $n$ , $a$ fails to detect the compositeness of $n$ , then $a$ is called a strong liar for $n$ and $n$ is a strong pseudoprime to base $a$ , while if $a$ manages to detect it, it is called a witness for $a$ .

For any composite $n$ , at least $3/4$ of the integers in the range $[2,n-2]$ are witnesses for $n$ , which is better than Fermat primality test: it has Carmichael numbers where none of the integers are witnesses, and even if $n$ is not Carmichael, there are only half witnesses guaranteed. For Miller-Rabin primality test, the probability of a composite is mistakenly passed as a prime for $k$ iterations is $4^{-k}$ .

However, it's still a flaw that the primality is not guaranteed. If we have an upper bound on $n$ , a solution is simply to fix the values of $a$ . For example, by simply testing for $a = 2, 3, 5, 7, 11, 13, 17$ , the test is guaranteed correct as long as $n < 341550071728321$ , so if we can guarantee that $n$ is lower than this, checking those seven values of $a$ is enough. There are many such deterministic variants.

The running time of this algorithm is composed of two parts: preparation, when we compute $n-1 = 2^s \cdot d$ , and testing, when we pick a random $a$ . The preparation is simply $O(\lg n)$ . The testing is $O(k)$ for $k$ trials, multiplied by $O(\lg n)$ for modular exponentiation, multiplied by time for multiplication; this is exactly the same running time as Fermat primality test.

AKS primality test

This section needs a more thorough explanation, as well as code for the algorithm.

So far, the primality tests discussed are either prohibitively slow (polynomial in $n$ instead of $\lg n$ , the number of digits of $n$ ), or only probabilistic (not guaranteed to be correct). This makes people back then to wander, whether the problem PRIMES, determining primality of a test, is actually solvable "quickly" enough (in polynomial time in number of digits) or not; that is, whether it is in complexity class $P$ or not.

In 1976, Gary Miller (the one who invented the Miller-Rabin primality test above, together with Michael Robin) also wrote about "Miller test," a deterministic variant (actually the original version) of Miller-Rabin primality test. This test runs in polynomial time, but its correctness depends on the so-far unsettled question of the generalized Riemann hypothesis, and thus this test is unsatisfactory.

However, in 2002, Manindra Agrawal, Neeraj Kayal, and Nitin Saxena at the Indian Institute of Technology solved this problem affirmatively by constructing such deterministic test that runs in polynomial time that doesn't depend on such unproven hypotheses. Their test is called AKS test, given on a paper titled simply PRIMES is in P.

The idea of the algorithm relies on the simple fact $(x-a)^n \equiv x^n-a \pmod{n}$ for all $a$ coprime to $n$ , if and only if $n$ is prime. Note that the two sides are polynomials; that is, each coefficient must be compared. The proof is simply done by Fermat's little theorem as used in Fermat primality test, together with $\binom{n}{k} \equiv 0 \pmod{n}$ for all $1 \le k \le n-1$ if and only if $n$ is prime.

The above is a primality test by itself, but it takes exponential time in number of digits, so the paper uses a similar fact: $(x-a)^n \equiv x^n-a \pmod{(n, x^r-1)}$ , which means $(x-a)^n - (x^n-a) = nf + (x^r-1)g$ for some polynomials $f,g$ . If $r = O(\lg n)$ , this fact can be checked in polynomial time. This fact is still satisfied by all primes, but some composites also pass, for different values of $a, r$ . The paper thus proves that there exists some small enough $r$ and small enough set $A$ , such that if for all $a \in A$ the relation is satisfied, then $n$ cannot be composite, proving that it is prime.

Currently, the running time of this algorithm is $\tilde{O}(\lg^6 n)$ (that is, $O(\lg^6 n \cdot \lg^k \lg n)$ for some $k$ ). The paper originally had $12$ on the exponent, but this is quickly reduced down by works of various authors. This is still slower than Miller-Rabin primality test as well as more complex, which is why Miller-Rabin test is used more widely at the moment for practical numbers (say, in that $341 \cdot 10^{12}$ range), but theoretically this algorithm is of great interest for being the first algorithm to prove that primality is not a hard problem after all.

Additional Problems

You've stumbled onto a significant vulnerability in a commonly used cryptographic library. It turns out that the random number generator it uses frequently produces the same primes when it is generating keys.

Exploit this knowledge to factor the (hexadecimal) keys below, and enter your answer as the last six digits of the largest factor you find (in decimal).

Key 1:
1c7bb1ae67670f7e6769b515c174414278e16c27e95b43a789099a1c7d55c717b2f0a0442a7d49503ee09552588ed9bb6eda4af738a02fb31576d78ff72b2499b347e49fef1028182f158182a0ba504902996ea161311fe62b86e6ccb02a9307d932f7fa94cde410619927677f94c571ea39c7f4105fae00415dd7d

Key 2: 
2710e45014ed7d2550aac9887cc18b6858b978c2409e86f80bad4b59ebcbd90ed18790fc56f53ffabc0e4a021da2e906072404a8b3c5555f64f279a21ebb60655e4d61f4a18be9ad389d8ff05b994bb4c194d8803537ac6cd9f708e0dd12d1857554e41c9cbef98f61c5751b796e5b37d338f5d9b3ec3202b37a32f

Contents