Cayley-Hamilton Theorem
In linear algebra, when studying a particular matrix, one is often interested in polynomial relations satisfied by this matrix. For example, the matrix \[A = \begin{pmatrix} 1 & 2 \\ 3 & 1 \end{pmatrix}\] satisfies \(A^2 - 2A = 5I\), where \(I\) denotes the \(2\)-by-\(2\) identity matrix: \[A^2 - 2A= \begin{pmatrix} 7 & 4 \\ 6 & 7 \end{pmatrix} - \begin{pmatrix} 2 & 4 \\ 6 & 2 \end{pmatrix} = \begin{pmatrix} 5 & 0 \\ 0 & 5 \end{pmatrix} = 5I.\] Knowing such relations can be useful in matrix computations (e.g. computing powers of matrices), as well as in investigating the eigenvalues and eigenvectors of a matrix.
The Cayley-Hamilton theorem produces an explicit polynomial relation satisfied by a given matrix. In particular, if \(M\) is a matrix and \(p_{M} (x) = \det(M-xI)\) is its characteristic polynomial, the Cayley-Hamilton theorem states that \(p_{M} (M) = 0\).
Motivation
Consider the matrix \(A\) given in the introduction above. Suppose one wishes to compute its fourth power \(A^{4}\). This can be done by using the polynomial relation \(A^2 - 2A = 5I\) as a way to reduce exponents in the expression: \[A^{4} = (2A + 5I)^2 = 4A^2 + 20A + 25I = 4(2A+5I) + 20A + 25 I = 28A + 45I.\] More generally, one can use the recurrence \(A^n = 2A^{n-1} +5A^{n-2}\) to write any power of \(A\) as an integer linear combination of \(A\) and \(I\).
At least for the sake of doing computations like this, one is interested in finding polynomials satisfied by a given matrix. In symbols, for a matrix \(M\), one seeks polynomials \(p(x)\) such that \(p(M) = 0\). Note that here, \(p(M)\) is itself a matrix, not a number; the \(0\) in this expression denotes the matrix whose entries are all zero.
One way to find a polynomial satisfied by an \(n\)-by-\(n\) matrix \(M\) is to exploit the vector space structure on the set of all \(n\)-by-\(n\) matrices. Let \(M_n (F)\) denote the vector space of \(n\)-by-\(n\) matrices with entries in a field \(F\) (for instance, \(F\) could be the real numbers \(\mathbb{R}\) or the complex numbers \(\mathbb{C}\)). Since an \(n\)-by-\(n\) matrix has \(n^2\) entries, the space \(M_n (F)\) has dimension \(n^2\). This implies that the matrices \(1, M, M^2, \ldots, M^{n^2}\) are linearly dependent over \(F\) (since there are \(n^2 + 1\) matrices in this collection). Thus, there are \(a_i \in F\), \(0\le i \le n^2\) such that \[a_0 + a_1 M + a_2 M^2 + \cdots + a_{n^2} M^{n^2} = 0.\] The polynomial \(q(x) = a_0 + a_1 x + \cdots + a_{n^2} x^{n^2}\) is therefore satisfied by \(M\).
However, this method is lacking in that it is nonconstructive; the coefficients \(a_i\) are shown to exist but are not produced/computed. On the other hand, the Cayley-Hamilton theorem is constructive; \(M\) is shown to satisfy an explicit and easily computed polynomial, namely the characteristic polynomial of \(M\). This polynomial is \[p_{M} (x) = \det(M - x \cdot I),\] and the Cayley-Hamilton theorem states that \(p_{M} (M) = 0\). Proving this is not as simple as making the substitution \(p_{M} (M) = \det(M - M \cdot I) = \det(0) = 0\), since the variable \(x\) in the definition of characteristic polynomial represents a number, not a matrix.
Proof assuming \(M\) has entries in \(\mathbb{C}\)
Suppose \(M\) is an \(n\)-by-\(n\) matrix. When \(M\) has entries in \(\mathbb{C}\), one can prove the Cayley-Hamilton theorem as follows:
A matrix \(M \in M_n (\mathbb{C})\) is called diagonalizable if there exists invertible \(B \in M_n (\mathbb{C})\) such that \(BMB^{-1}\) is diagonal. Recall that a diagonal matrix is a matrix for which all entries off the main diagonal (the diagonal from top left to bottom right) are zero. Diagonal matrices have a very simple multiplicative structure; when one multiplies two diagonal matrices, the entries in both main diagonals multiply termwise. In particular, one can see why a diagonal matrix should satisfy its own characteristic polynomial: each entry on the main diagonal is an eigenvalue of the matrix. Consequently, it follows any diagonalizable matrix also satisfies its own characteristic polynomial, since \[0 = p(BMB^{-1}) = Bp(M)B^{-1} \implies p(M) = 0.\] The proof of Cayley-Hamilton therefore proceeds by approximating arbitrary matrices with diagonalizable matrices (this will be possible to do when entries of the matrix are complex, exploiting the fundamental theorem of algebra). To do this, first one needs a criterion for diagonalizability of a matrix:
Lemma: If \(M\in M_{n} (\mathbb{C})\) has \(n\) distinct eigenvalues, then \(M\) is diagonalizable.
Suppose the roots of the characteristic polynomial \(p_{M} (x)\) are all distinct, i.e. the eigenvalues \(\lambda_i\) with \(1\le i \le n\) have \(\lambda_i \neq \lambda_j\) for \(i \neq j\). Let \(v_i\) denote the eigenvector associated with \(\lambda_i\).
Assume that there are coefficients \(a_i \in \mathbb{C}\) with \[a_1 v_1 + \cdots + a_n v_n = 0. \] Applying \(M^k \) to this equation gives the relation \[a_1 \lambda_{1}^{k} v_1 + \cdots + a_n \lambda_{n}^{k} v_n = 0.\] Choose a polynomial \(q_i\) such that \(q_i (\lambda_i) = 1\) and \(q_i (\lambda_j) = 0\) for \(j\neq i\) (using, for instance, Lagrange interpolation). The above equations imply \(a_i q_i (\lambda_i) v_i = 0 \implies a_i = 0\). Thus, the eigenvectors \(\{v_i\}\) are linearly independent.
Since there are \(n\) eigenvectors, the collection \(\{v_i\}\) must form a basis for the vector space \(\mathbb{C}^n\). In this basis, the matrix for the linear transformation corresponding to \(M\) is diagonal. Let \(B\) denote the change of basis matrix sending \(v_i\) to the \(i^\text{th}\) standard basis vector \(\big(\)i.e. \((0,\cdots,0,1,0,\cdots,0),\) where \(1\) is in the \(i^\text{th}\) slot\(\big).\) Then \(BMB^{-1}\) is diagonal, as desired.
Corollary: If \(M \in M_n (\mathbb{C})\), for any \(\epsilon > 0\) there is a diagonalizable matrix \(N \in M_n (\mathbb{C})\) such that entries in \(M\) are within \(\epsilon\) of corresponding entries in \(N\).
The roots of \(p_{M} (x)\) are continuous functions of the entries in \(M\). Thus, arbitrarily small perturbations of these entries will suffice to make all these roots distinct. This reasoning is only valid because of the fundamental theorem of algebra, which ensures all roots of \(p_{M} (x)\) are in \(\mathbb{C}\).
To complete the proof, let \(\{N_k\}_{k\in\mathbb{N}} \) be a sequence of diagonalizable matrices converging to \(M\) (where convergence is matrix-entry-wise). Since \(p_{N_k} (N_k) = 0\) for all \(k \in \mathbb{N}\) and \(N \mapsto p_{N}(N)\) is continuous as a function of the matrix entries, taking \(k\to\infty\) implies \(p_{M} (M) = 0\), which is the result of the Cayley-Hamilton theorem.
Examples and Problems
Prove that the inverse of a \(2\)-by-\(2\) matrix is given by the formula below:
\[\begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} .\]
Let \(A\) denote the given matrix; its characteristic polynomial is \[p_{A} (x) = (a-x)(d-x)-bc = x^2 - (a+d)x + (ad-bc).\] By the Cayley-Hamilton theorem, it follows that \[A^2 - (a+d)A + (ad-bc) I = 0,\] where \(I\) is the \(2\)-by-\(2\) identity matrix. Thus, \[A\big((a+d)I - A\big) = (ad-bc) I \implies A^{-1} = \frac{1}{ad-bc} \big((a+d)I - A\big) = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} .\ _\square\]
Let \(M\) be an \(n\)-by-\(n\) matrix. \(M\) is called nilpotent if there exists some integer \(k\) such that \(M^k = 0\). Prove that if \(M\) is nilpotent, then \(M^n = 0\).
Suppose \(v\) is a nonzero eigenvector of \(M\), with eigenvalue \(\lambda\). Then \[0 = M^k v = \lambda^k v \implies \lambda = 0.\] Since all roots of the (degree \(n\)) characteristic polynomial \(p_{M} (x)\) are eigenvalues, it follows \(p_{M} (x) = x^n\). By the Cayley-Hamilton theorem, we may now conclude \(M^n = 0\). \(_\square\)
Consider the \(2\)-by-\(2\) matrix \[A = \begin{pmatrix} 5 & 1 \\ -1 & 3 \end{pmatrix}. \] What is the trace of \(A^{100}?\)
\(\)
Note: The trace of a \(2\)-by-\(2\) matrix is defined as \[\text{Tr}\begin{pmatrix} a & b \\ c& d \end{pmatrix} = a+d\; .\]
Consider the set of \(n\)-by-\(n\) matrices whose entries are integers modulo \(p\), for some prime \(p\). The matrices in this set with nonzero determinant are invertible, and thus form a group under matrix multiplication; this group is given the name \(\text{GL}(n,p)\), where the "GL" stands for "general linear."
Given a matrix \(M \in \text{GL}(n,p)\), its order is defined to be the smallest integer \(k\) such that \(M^k = I\), where \(I \in \text{GL}(n,p)\) denotes the identity matrix. In this problem, you will compute the maximum possible order of a matrix in \(\text{GL}(n,p)\). That is, you will answer the question "What is the largest integer \(m \in \mathbb{N}\) such that \(m\) is the order of a matrix in \(\text{GL}(n,p)?\)"
\(\)
Hint & Proof Sketch:
Let \(A\) be a matrix in \(\text{GL}(n,p)\). Consider the vector space \(V(A,p)\) generated by the powers of \(A\), with coefficients in the field of \(p\) elements. That is, \(V(A,p)\) is the vector space (under addition) of linear combinations \[a_0 + a_1 A + a_2 A^2 + a_3 A^3 + \cdots,\] where the coefficients \(a_i\) are integers modulo \(p\). The Cayley-Hamilton theorem implies \(V(A,p)\) is finite dimensional; what is the largest possible value of its dimension \(\big(\)as \(A\) ranges over the group \(\text{GL}(n,p)\big)?\)
Suppose \(\dim\big(V(A,p)\big) = k\). What does this imply about the order of \(A\) in \(\text{GL}(n,p)?\) To answer this question, note that any power of \(A\) must be in \(V(A,p)\), since the exponents in powers of \(A\) can be reduced using the polynomial relation produced by Cayley-Hamilton.
Let \(K\) denote the largest possible value of \(\dim\big(V(A,p)\big)\) when \(A\) ranges over \(\text{GL}(n,p)\). Can you find a matrix \(B \in \text{GL}(n,p)\) whose order is at least \(K?\) What does this imply in light of the previous two steps?
As your answer to this problem, submit the maximum possible order of a matrix in \(\text{GL}(4,5)\), i.e. \[\max_{A \in \text{GL}(4,5)} \text{Order}(A).\]