# This note has been used to help create the Knapsack Cryptosystem wiki

This note offers a comprehensive explanation of one of the world's earliest public key cryptosystems. This note is self-contained and has no prerequisite except elementary knowledge in Number Theory.To view the previous two posts of my cryptography series click the links below:

The Diffie-Hellman Key Exchange system

\(\textbf{Introduction:-}\)A public-key cryptosystem based on a special case of the classic problem in combinatorics known as the *knapsack problem* was developed by Ralph Merklee and Martin Hellman in 1978.It is one of the earliest public key cryptosystems.Thsi special case of the knapsack problem is also known as the *subset sum problem* To explain this algorithm first we need to understand what the subset sum problem is.Here is a picture of the two developers.

\(\textbf{The subset sum problem:-}\)

It can be stated as follows :-

"*Given a knapsack of volume \(V\), and \(n\) elements of various volumes \(a_1,a_2,a_3,.......,a_n\),can a subset of these elements be found that will completely fill the knapsack??*"

Another alternative form of the problem is :

"*For positive integers \(a_1,a_2,a_3,...,a_n\) and a sum \(V\),solve the equation \[V = a_1x_1+a_2x_2+......a_nx_n\] where the value of \(x_i\) can be 0 or 1 for all \(i = 1\) to \(n\).*"

Clearly there may be no solution,or more than one solution to the problem depending on the choice of the sequence \(a_1,a_2,...a_n\) and the integer \(V\).

For example, the equation \[3x_1 + 7x_2 + 9x_3+1x_4 + 20x_5 = 22\] has no solution

and the equation \[3x_1+7x_2+9x_3+11x_4+20x_5 = 22\] has 2 distinct solutions which are \(x_2 = x_3 = x_4 = 1 \) , \(x_1 = x_5 = 0\) and \(x_2 = x_5 =1\) and \(x_1 = x_3 = x_4 = 0\)

The solution to any randomly chosen knapsack problem is NP-hard.All the methods that are used are almost as time consuming as conducting an exhaustive search,that is testing all the \(2^{n}\) possibilities of \(x_1,x_2,....x_n\).This is computationally in-feasible for \(n >100\)

However if the sequence of integers \(a_1,a_2,....,a_n\) have some special properties ,the knapsack problem becomes much easier to solve. we call a sequence of integers to be super-increasing when \(a_i\) is greater than each of it's preceding terms,i.e, \[a_i > a_1+a_2+...........+a_{i-1}\] for \(i = 1\) to \(n\).

A simple example of a super-increasing sequence is \(1,2,4,8,..........2^{n}\) where \(2^{i} > 2^{i} - 1 = 1+2+4+........+2^{i-1}\).

If a particular knapsack problem is based on a super-increasing sequence it has an unique solution is it is solvable at all.

If we want to solve the problem \[V = a_1x_1 + a_2x_2 + ....a_nx_n\] where \(a_1,a_2,......a_n\) is a super-increasing sequence of integers then clearly the equation is solvable if \( V \le a_1+a_2+...........+a_n\).Working from left to right we begin by putting \(x_n = 1\) is \(V \ge a_n\) and \(x_n = 0\) if \(V < a_n\).Then we obtain the remaining terms by choosing \(x_i = 0\) if \(V - (a_{i+1}x_{i+1}+........+a_{n}x_{n}) \ge a_i \) \(x_i = 1\) if \(V - (a_{i+1}x_{i+1}+........+a_{n}x_{n}) < a_i \). With this algorithm in mid the super increasing knapsack problems can be solved quite easily.

For example,

\[2x_1 + 7x_2 + 12x_3 + 21x_4 + 85x_5 = 30\]

we start with the largest co-efficient in the equation,\(85 > 30\)hence \(x_5 = 0\). The next co-efficient is \(21\),\(21<30\).Now the sum of the co-effiecients is \(2+5+12 = 19 < 30\),so this cannot fill in the knapsack,so \(21\) must be included . So now tthe original problem may be written as,

\[2x_1 + 7x_2 + 12x_3 = 9\] In this way we can solve for the terms.

\(\textbf{Key-Generation for the cipher:-}\)

Tn the Merklee-Hellman cipher,the keys are two knapsacks.The public key is the hard knapsack and the private key is the super-increasing knapsack.Two numbers,one called a multiplier and the other called a modulus are used to convert the super-increasing knapsack into the hard knapsack.These same number are used to convert the sum of the subsets of the hard knapsack into the sum of the subsets of the easy knapsack which is a problem solvable in polynomial time.

\(\textbf{ALGORITHM:-}\)

\(\textbf{Encryption:-}\)

\(\textbf{Step 1.}\)A typical user first chooses a super-increasing sequence \(a_1,a_2,...,a_n\).Then a modulus \(m > 2a_n\) and a multiplier \(0 < a <m\) and and \(gcd(a,m) = 1\) is selected.This ensures that the congruence \(ax \equiv 1 \pmod{m}\) has a unique solution say \( x \equiv c \pmod{m}\).

\(\textbf{Step 2.}\)Then the sequence of integers \(b_1,b_2,...b_n\) defined by \[b_i \equiv aa_i \pmod{m}\] for all \(i=1\) to \(n\) where \(0<b_i<m\) is selected.Carrying out this transformation generally destroys the super increasing property enjoyed by \(a_i\).

\(\textbf{Step 3.}\) The user keeps secret the original sequence \(a_1,a_2,...a_n\) and \(m\) and \(a\), but publishes \(b_1,b_2,...b_n\) in a public directory.Anyone who wishes to send a message employs this publicly available sequence as the encryption key.

The sender begins by converting the plaintext message into a string of \(0\)s and \(1\)s by using the binary representation of digits as shown below.

\(\textbf{Step 4.}\)The string is then split into \(n\) blocks of binary digits.The public encrypting sequence \(b_1,b_2,...,b_n\) is then used to transform the given plaintext, say \(x_1,x_2,...,x_n\) into the sum \[S = b_1x_1 + b_2x_2 +...... b_nx_n\] The number S is the hidden information that the sender transmits over an insecure communication channel.

Clearly since each \(x_i\) is either \(0\) or \(1\),the problem of obtaining the plaintext block from S is equivalent to solving a difficult knapsack problem.But the private key can be used to convert this hard knapsack into an easy one.

\(\textbf{Decryption:-}\)

Knowing the values of \(c\) and \(m\),the recipient computes, \[S' = cS \pmod{m}\] where \(0 < S' < m\),

\[S' = cb_1 + cb_2 + cb_3 + .... +cb_n \pmod{m}\]

\[S' = caa_1 + caa_2 + caa_3 + .... +caa_n \pmod{m}\]

Since \(ca \equiv 1 \pmod{m} \),we have,

\[S' = a_1x_1 + .... + a_nx_n \pmod{m}\].

Because m was chosen such that \(m > 2a_n > a_1 + a_2 +... + a_n\), we have \(a_1x_1 + .... + a_nx_n < m\).So in the light of the condition \[S' = a_1x_1 + .... + a_nx_n \] must hold.The solution of this super increasing knapsack problem furnishes a solution to a difficult problem and the plain-text block \(x_1,x_2,....x_n\) is recovered which is converted back to it's alphabetic representation.

Click this link to see the knapsack cryptosystem at work

\(\textbf{Security:-}\)This cipher aroused a great deal of interest at the beginning because it was based on a probably hard problem but in 1982 Adi Shamir provided a fast algorithm for solving knapsack problems.The main weakness of this system is that the sequence \(b_1,b_2,....,b_n\) is too special.The system can be made difficult by changing the values of \(a\) and \(m\) at regular intervals(iterating) but even that variant was cracked in the year 1985!! Since this cipher has been broken , it makes the RSA Cipher the dominant public key cryptography algorithm.

## Comments

There are no comments in this discussion.