Knapsack Cryptosystem

The knapsack cryptosystem is a public-key cryptosystem based on a special case of the classic problem in combinatorics known as the knapsack problem. It was developed by Ralph Merklee and Martin Hellman in 1978 and is one of the earliest public key cryptosystems. This special application of the knapsack problem is also akin to the subset sum problem, where the solution is rather time-consuming to compute as it belongs in the complexity class of NP.

Introduction

The knapsack cryptosystem relies heavily on the subset sum problem. This problem can be stated as follows:

Subset Sum Problem

Given a knapsack of volume $V$ and $n$ elements of various volumes $a_1,a_2,a_3,\ldots,a_n$ , can a subset of these elements be found that will completely fill the knapsack?

Another alternative form of the problem can be stated as follows:

For positive integers $a_1,a_2,a_3,...,a_n$ and a sum $V$ , solve the equation

$V = a_1x_1+a_2x_2+\cdots+a_nx_n,$

where the value of $x_i$ can be 0 or 1 for all $i = 1$ to $n$ .

It is evident that there may be no solution or more than one solution to the subset sum problem, depending on the choice of the sequence $a_1,a_2,\ldots,a_n$ and the integer $V$ . This is the same as saying that the equation

$3x_1 + 7x_2 + 9x_3+1x_4 + 20x_5 = 22$

has no solution, whereas the equation

$3x_1+7x_2+9x_3+11x_4+20x_5 = 27$

has 2 distinct solutions, which are

$x_2 = x_3 = x_4 = 1, x_1 = x_5 = 0 \qquad \text{and}\qquad x_2 = x_5 =1, x_1 = x_3 = x_4 = 0.$

The solution to any randomly chosen knapsack problem is of time complexity NP-hard, meaning that it is not efficient for a computer to solve. That said, all the known algorithms are as time-consuming as conducting an exhaustive search, testing all the $2^{n}$ possibilities of $x_1,x_2,\ldots, x_n$ . This is computationally infeasible when the number of variables, $n$ , is greater than $100$ .

In order to alleviate the need for the extensive time complexity presented in the subset sum problem, algorithms often attempt to identify special properties within the coefficients, or values, of each item in the knapsack, $a_1,a_2,\ldots,a_n$ , that make the problem much easier to solve.

One of these characteristics arises when a sequence of integers is super-increasing, meaning that $a_i$ is greater than the sum of its preceding terms. Namely,

$a_i > a_1+a_2+\cdots+a_{i-1}$

for $i = 1$ to $n$ . A simple example of a super-increasing sequence is presented in the numbers $1,2,4,8,\ldots,2^{n},$ where

$2^{i} > 2^{i} - 1 = 1+2+4+\cdots+2^{i-1}.$

If a particular knapsack problem is based on a super-increasing sequence, it has a unique solution, granted it is solvable at all. If we want to solve the problem

$V = a_1x_1 + a_2x_2 +\cdots+a_nx_n,$

where $a_1,a_2,\ldots,a_n$ is a super-increasing sequence of integers, then the equation is solvable if and only if

$V \le a_1+a_2+\cdots+a_n.$

The solution can be attained by assigning

$x_n = 1\ \text{ if }\ V \ge a_n\quad \text{and} \quad x_n = 0\ \text{ if }\ V < a_n.$

Then, the remaining terms are obtained by choosing

$x_i = 0\ \text{ if }\ V - (a_{i+1}x_{i+1}+\cdots+a_{n}x_{n}) \ge a_i \quad \text{and} \quad x_i = 1\ \text{ if }\ V - (a_{i+1}x_{i+1}+\cdots+a_{n}x_{n}) < a_i .$

This assignment produces a computationally easy problem to solve, taking a marginal amount of time compared to the original problem.

For example, for a problem like

$2x_1 + 7x_2 + 12x_3 + 21x_4 + 85x_5 = 30,$

the largest coefficient in the equation is assigned first, $85 > 30$ , and the coefficient is assigned to be $x_5 = 0$ . The next coefficient is $21$ , and $21<30$ . In this case, the sum of the remaining coefficients is

$2+5+12 = 19 < 30,$

which cannot fill in the knapsack. Therefore $21$ must be included, and now the original problem may be written as

$2x_1 + 7x_2 + 12x_3 = 9,$

which can be solved quite easily.

Key Generation

The knapsack cryptosystem, also known as the Merklee-Hellman cipher, begins by generating user encryption keys as two knapsacks, one public and one private. The public key is the hard knapsack and the private key is the super-increasing knapsack. Two numbers, one called a multiplier and the other a modulus, are used to convert the super-increasing knapsack into the hard knapsack. These same numbers are used to convert the sum of the subsets of the hard knapsack into the sum of the subsets of the easy knapsack, which is a problem solvable in polynomial time.

Algorithm

Encryption:

Step 1. A typical user first chooses a super-increasing sequence $a_1,a_2,...,a_n$ .Then a modulus $m > 2a_n$ and a multiplier $0 < a <m$ such that $\gcd(a,m) = 1$ are selected. This ensures that the congruence $ax \equiv 1 \pmod{m}$ has a unique solution, say, $x \equiv c \pmod{m}$ .
Step 2. Then the sequence of integers $b_1,b_2,\ldots,b_n$ defined by $b_i \equiv aa_i \pmod{m}$ for all $i=1$ to $n,$ where $0<b_i<m,$ is selected. Carrying out this transformation generally destroys the super-increasing property enjoyed by $a_i$ .
Step 3. The user keeps secret the original sequence $a_1,a_2,\ldots,a_n$ and $m$ and $a$ , but publishes $b_1,b_2,\ldots,b_n$ in a public directory. Anyone who wishes to send a message employs this publicly available sequence as the encryption key. The sender begins by converting the plaintext message into a string of $0$ 's and $1$ 's by using the ASCII alphabet binary representation of digits.
Step 4. The string is then split into $n$ blocks of binary digits. The public encrypting sequence $b_1,b_2,\ldots,b_n$ is then used to transform the given plaintext, say, $x_1,x_2,\ldots,x_n$ into the sum $S = b_1x_1 + b_2x_2 +\cdots+b_nx_n,$ where the number $S$ is the hidden information that the sender transmits over an insecure communication channel.

Since each $x_i$ is either $0$ or $1$ , the problem of obtaining the plaintext block from $S$ is equivalent to solving a difficult knapsack problem. However, the private key can be used to convert this hard knapsack into an easy one, allowing an efficient decryption process.

Decryption:

Knowing the values of $c$ and $m$ , the recipient computes $S' = cS \pmod{m},$ where $0 < S' < m:$

$\begin{aligned} S' &= cb_1 + cb_2 + cb_3 +\cdots+cb_n \pmod{m}\\ &= caa_1 + caa_2 + caa_3 +\cdots+caa_n \pmod{m}. \end{aligned}$

Since $ca \equiv 1 \pmod{m}$ ,

$S' = a_1x_1 +\cdots+ a_nx_n \pmod{m}.$

Because $m$ was chosen such that $m > 2a_n > a_1 + a_2 +\cdots+ a_n$ , this leaves

$a_1x_1 +\cdots+ a_nx_n < m.$

Therefore,

$S' = a_1x_1 + \cdots + a_nx_n$

must hold. The solution of this super-increasing knapsack problem furnishes a solution to a difficult problem, and the plaintext block $x_1,x_2,\ldots,x_n$ is recovered, which is converted back to its alphabetic representation.

For a more thorough explanation of the knapsack cryptosystem, refer to this worked example of the algorithm.

Security

This cipher aroused a great deal of interest at the beginning because it was based on a probably hard problem, but in 1982 Adi Shamir provided a fast algorithm for solving knapsack problems. The main weakness of this system is that the sequence $b_1,b_2,\ldots,b_n$ is too special. The system can be made difficult by changing the values of $a$ and $m$ at regular intervals (iterating) but even that variant was cracked in 1985. Since the knapsack cryptosystem is obsolete, the RSA cipher is now the dominant public key cryptography algorithm.

Contents