Matrix Diagonalization

A diagonal square matrix is a matrix whose only nonzero entries are on the diagonal: \[ D = \begin{pmatrix} d_{11} & & & \\ & d_{22} & & \\ & & \ddots & \\ & & & d_{nn} \end{pmatrix}. \] A square matrix is said to be diagonalizable if it is similar to a diagonal matrix. That is, \(A\) is diagonalizable if there is an invertible matrix \(P\) and a diagonal matrix \(D\) such that \(A=PDP^{-1}.\)

The matrix \(A = \begin{pmatrix} 0&1\\1&0 \end{pmatrix}\) is diagonalizable: \[ A = \begin{pmatrix}1&1\\1&-1 \end{pmatrix} \begin{pmatrix} 1&0\\0&-1 \end{pmatrix} \begin{pmatrix}1&1\\1&-1 \end{pmatrix}^{-1}. \] On the other hand, the matrix \(B = \begin{pmatrix} 1&1\\0&1 \end{pmatrix}\) is not diagonalizable, as we will see below.

By the change of basis theorem, an \(n\times n\) matrix \(A\) with entries in a field \(F\) is diagonalizable if and only if there is a basis of \(F^n\) consisting of eigenvectors of \(A.\) \((\)In what follows, we will mostly assume \(F={\mathbb R}\) or \({\mathbb C},\) but the definition is valid over an arbitrary field.\()\) This extends immediately to a definition of diagonalizability for linear transformations: if \(V\) is a finite-dimensional vector space, we say that a linear transformation \(T \colon V \to V\) is diagonalizable if there is a basis of \(V\) consisting of eigenvectors for \(T.\)

So the process of diagonalizing a matrix involves computing its eigenvectors and following the recipe of the change-of-basis theorem to compute the matrices \(P\) and \(D.\) Matrix diagonalization is useful in many computations involving matrices, because multiplying diagonal matrices is quite simple compared to multiplying arbitrary square matrices. In particular, the powers of a diagonalizable matrix can be easily computed once the matrices \(P\) and \(D\) are known, as can the matrix exponential.

Diagonalizability with Distinct Eigenvalues

Review Eigenvalues and Eigenvectors.

The first theorem about diagonalizable matrices shows that a large class of matrices is automatically diagonalizable.

If \(A\) is an \(n\times n\) matrix with \(n\) distinct eigenvalues, then \(A\) is diagonalizable.

Explicitly, let \(\lambda_1,\ldots,\lambda_n\) be these eigenvalues. Then \[A=P \begin{pmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{pmatrix} P^{-1},\] where the \(j^\text{th}\) column of \(P\) is an eigenvector of \(A\) with eigenvalue \(\lambda_j.\)

Let \(v_i\) be an eigenvector with eigenvalue \(\lambda_i,\) \(1 \le i \le n.\) Then the key fact is that the \(v_i\) are linearly independent. To see this, let \(k\) be the largest positive integer such that \(v_1,\ldots,v_k\) are linearly independent. If \(k \ne n,\) then there is a dependence relation \[ a_1 v_1 + a_2 v_2 + \cdots + a_k v_k = v_{k+1} \] for some coefficients \(a_i.\) Now multiply both sides on the left by \(A\) to get \[ a_1 \lambda_1 v_1 + a_2 \lambda_2 v_2 + \cdots + a_k \lambda_k v_k = \lambda_{k+1} v_{k+1}. \] Multiplying both sides of the original equation by \(\lambda_{k+1}\) instead gives \[ a_1 \lambda_{k+1} v_1 + a_2 \lambda_{k+1} v_2 + \cdots + a_k \lambda_{k+1} v_k = \lambda_{k+1} v_{k+1} \] and subtracting these two equations gives \[ a_1 (\lambda_1-\lambda_{k+1}) v_1 + a_2 (\lambda_2 - \lambda_{k+1}) v_2 + \cdots + a_k (\lambda_k-\lambda_{k+1}) v_k = 0, \] but this is impossible because \(v_1,\ldots,v_k\) are linearly independent. Note that it is very important that the \(\lambda_i\) are distinct, because at least one of the \(a_i\) are nonzero, so the coefficient \(a_i(\lambda_i-\lambda_{k+1})\) is nonzero as well--if the \(\lambda_i\) were not distinct, the coefficients of the left side might all be zero even if some of the \(a_i\) were nonzero.

So \(k=n\) and \(v_1,\ldots,v_n\) are linearly independent. So this gives a basis of eigenvectors of \(A,\) and hence \(A\) is diagonalizable. Indeed, if \(P\) is the matrix whose column vectors are the \(v_i,\) then let \(e_i\) be the \(i^\text{th}\) column of the identity matrix; then \(P(e_i) = v_i\) for all \(i.\) So \[ (PD)(e_i) = P(\lambda_i e_i) = \lambda_i v_i = A(v_i) = (AP^{-1})(e_i). \] But multiplying a matrix by \(e_i\) just gives its \(i^\text{th}\) column. So \(PD\) and \(AP^{-1}\) have the same \(i^\text{th}\) column, for all \(i.\) So they're the same matrix: \(PD = AP^{-1},\) or \(PDP^{-1} = A.\) \(_\square\)

Non-diagonalizable Matrices

The intuition from the theorem in the previous section is that there are two ways that a matrix can fail to be diagonalizable. One is that its eigenvalues can "live" in some other, larger field. This is in some sense a cosmetic issue, which can be corrected by passing to the larger field. For instance, if the matrix has real entries, its eigenvalues may be complex, so that the matrix may be diagonalizable over \(\mathbb C\) without being diagonalizable over \(\mathbb R.\)

The rotation matrix \(R = \begin{pmatrix} 0&-1\\1&0 \end{pmatrix}\) is not diagonalizable over \(\mathbb R.\) Indeed, it has no real eigenvalues: if \(v\) is a vector in \({\mathbb R}^2,\) then \(Rv\) equals \(v\) rotated counterclockwise by \(90^\circ.\) But it is not hard to check that it has two distinct eigenvalues over \(\mathbb C,\) since the characteristic polynomial is \(t^2+1 = (t+i)(t-i).\) So \(R\) is diagonalizable over \(\mathbb C.\)

The second way in which a matrix can fail to be diagonalizable is more fundamental. The fundamental theorem of algebra applied to the characteristic polynomial shows that there are always \(n\) complex eigenvalues, counted with multiplicity. But this does not mean that every square matrix is diagonalizable over the complex numbers. The multiplicity of each eigenvalue is important in deciding whether the matrix is diagonalizable: as we have seen, if each multiplicity is \(1,\) the matrix is automatically diagonalizable. Here is an example where an eigenvalue has multiplicity \(2\) and the matrix is not diagonalizable:

Let \(A = \begin{pmatrix} 1&1 \\ 0&1 \end{pmatrix}.\) Then the characteristic polynomial of \(A\) is \((t-1)^2,\) so there is only one eigenvalue, \(\lambda=1.\) Since similar matrices have the same eigenvalues (indeed, the same characteristic polynomial), if \(A\) were diagonalizable, it would be similar to a diagonal matrix with \(1\) as its only eigenvalue, namely the identity matrix. But the only matrix similar to the identity matrix is the identity matrix: \(PI_2P^{-1} = I_2\) for all \(P.\) So \(A\) cannot be diagonalizable.

There are other ways to see that \(A\) is not diagonalizable, e.g. by computing the size of the eigenspace corresponding to \(\lambda=1\) and showing that there is no basis of eigenvalues of \(A.\)

Note that having repeated roots in the characteristic polynomial does not imply that the matrix is not diagonalizable: to give the most basic example, the \(n\times n\) identity matrix is diagonalizable (diagonal, in fact), but it has only one eigenvalue \(\lambda=1\) with multiplicity \(n.\)

More generally, there are two concepts of multiplicity for eigenvalues of a matrix. The multiplicity we have referred to above, the exponent of the factor \((t-\lambda)\) in the characteristic polynomial, is known as the algebraic multiplicity. The dimension of the eigenspace corresponding to \(\lambda\) is called the geometric multiplicity. It is not hard to prove that the algebraic multiplicity is always \(\ge\) the geometric multiplicity, so \(A\) is diagonalizable if and only if these multiplicities are equal for every eigenvalue \(\lambda.\) See the wiki on Jordan canonical form for more details.

Examples

Diagonalize \(A=\begin{pmatrix} 1&-1\\2&4\end{pmatrix}.\)

The eigenvalues are the roots \(\lambda\) of the characteristic polynomial: \[\begin{align} \det(A-\lambda I)=\begin{vmatrix} 1-\lambda&-1\\2&4-\lambda\end{vmatrix}=0\implies (1-\lambda)(4-\lambda)+2&=0\\ \lambda^2-5\lambda+6&=0\\ \lambda&= 2,3. \end{align}\] We find eigenvectors for both eigenvalues:

\(\lambda_1=2\implies \begin{pmatrix} -1&-1\\2&2\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix}=0.\) One solution, the one we will pick, is \(s_1 =\begin{pmatrix}1\\-1\end{pmatrix}.\)

\(\lambda_2=3\implies \begin{pmatrix} -2&-1\\2&1\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix}=0.\) One solution, the one we will pick, is \(s_2=\begin{pmatrix}-1\\2\end{pmatrix}\).

The prescription in the change-of-basis theorem leads us immediately to the matrices \(P\) and \(D\):

\(P=\begin{pmatrix}s_1&s_2 \end{pmatrix}=\begin{pmatrix}1&-1\\-1&2 \end{pmatrix}\)

\(D=\begin{pmatrix}\lambda_1&0\\0&\lambda_2\end{pmatrix}=\begin{pmatrix}2&0\\0&3\end{pmatrix}\).

Then we have

\[A=PD P^{-1}=\begin{pmatrix}1&-1\\-1&2\end{pmatrix}\begin{pmatrix}2&0\\0&3\end{pmatrix}\begin{pmatrix}2&1\\1&1\end{pmatrix}.\ _\square\]

Diagonalize \(A=\begin{pmatrix}2&1&1\\-1&0&-1\\-1&-1&0 \end{pmatrix}\).

The process starts in the same way:

\[\begin{align} \det(A-\lambda I)=\begin{vmatrix} 2-\lambda&1&1\\-1&-\lambda&-1\\-1&-1&-\lambda\end{vmatrix}&=0\\\\ \implies \lambda^2(2-\lambda)+2+(\lambda-2)-\lambda-\lambda&=0\\ -\lambda^3+2\lambda^2-\lambda&=0\\ \lambda&= 0,1. \end{align}\]

There are only two eigenvalues, and the eigenvalue \(1\) has algebraic multiplicity \(2,\) since the characteristic polynomial factors as \(t(t-1)^2.\) So it is not clear whether \(A\) is diagonalizable until we know whether there are enough eigenvectors in the \(1\)-eigenspace \((\)i.e. whether the geometric multiplicity of \(1\) is \(1\) or \(2).\)

We find eigenvectors for these eigenvalues:

\(\lambda_1 = 0:\)
\(N(A-\lambda_1 I ) = N(A),\) which can be computed by Gauss-Jordan elimination: \[ \begin{align} \begin{pmatrix}2&1&1\\-1&0&-1\\-1&-1&0 \end{pmatrix} &\rightarrow \begin{pmatrix}-1&0&-1\\2&1&1\\-1&-1&0 \end{pmatrix} \\ &\rightarrow \begin{pmatrix}-1&0&-1\\0&1&-1\\-1&-1&0 \end{pmatrix} \\ &\rightarrow \begin{pmatrix} 1&0&1\\0&1&-1\\-1&-1&0 \end{pmatrix} \\ &\rightarrow \begin{pmatrix} 1&0&1\\0&1&-1\\0&0&0 \end{pmatrix}, \end{align} \] which has nullspace spanned by the vector \(s_1 = \begin{pmatrix} -1\\1\\1 \end{pmatrix}.\)

\(\lambda_2 = 1:\)
\(N(A-\lambda_2I) = N(A-I),\) which can be computed by Gauss-Jordan elimination: \[ \begin{pmatrix} 1&1&1 \\ -1&-1&-1 \\ -1&-1&-1 \end{pmatrix} \rightarrow \begin{pmatrix} 1&1&1\\0&0&0 \\ 0&0&0 \end{pmatrix}, \] which has a two-dimensional nullspace, spanned by, for instance, the vectors \(s_2 = \begin{pmatrix} 1\\-1\\0\end{pmatrix}\) and \(s_3 = \begin{pmatrix} 1\\0\\-1 \end{pmatrix}.\)

So this shows that \(A\) is indeed diagonalizable, because there are "enough" eigenvectors to span \({\mathbb R}^3.\)

And we can write down the matrices \(P\) and \(D\):

\(P=\begin{pmatrix}-1&1&1\\1&-1&0\\1&0&-1\end{pmatrix}\)

\(D=\begin{pmatrix} 0&0&0\\0&1&0\\0&0&1 \end{pmatrix}\).

It is straightforward to check that \(A=PDP^{-1}\) as desired. \(_\square\)

Note that the matrices \(P\) and \(D\) are not unique. \(D\) is unique up to a rearrangement of the diagonal terms, but \(P\) has much more freedom: while the column vectors from the \(1\)-dimensional eigenspaces are determined up to a constant multiple, the column vectors from the larger eigenspaces can be chosen completely arbitrarily as long as they form a basis for their eigenspace.

Applications

Diagonal matrices are relatively easy to compute with, and similar matrices share many properties, so diagonalizable matrices are well-suited for computation. In particular, many applications involve computing large powers of a matrix, which is easy if the matrix is diagonal. But if \(A = PDP^{-1},\) then \[ A^n = (PDP^{-1})^n = (PDP^{-1})(PDP^{-1})(\cdots)(PDP^{-1}) = PD^nP^{-1} \] because the \(P^{-1}P\) terms in the middle all collapse.

Find a closed-form formula for the \(n^\text{th}\) Fibonacci number \(F_n,\) by looking at powers of the matrix \(A = \begin{pmatrix} 1&1\\1&0 \end{pmatrix}.\)

Looking at the first few powers of \(A,\) we have \[ \begin{align} A^1 &= \begin{pmatrix} 1&1\\1&0 \end{pmatrix} \\ A^2 &= \begin{pmatrix} 2&1\\1&1 \end{pmatrix} \\ A^3 &= \begin{pmatrix} 3&2\\2&1 \end{pmatrix} \\ A^4 &= \begin{pmatrix} 5&3\\3&2 \end{pmatrix} \\ A^5 &= \begin{pmatrix} 8&5\\5&3 \end{pmatrix}, \end{align} \] so the natural conjecture is that \(A^n = \begin{pmatrix} F_{n+1}&F_n\\F_n&F_{n-1} \end{pmatrix},\) which is easy to prove by induction. The base case is clear, and the inductive step is \[ \begin{align} A^n = A \cdot A^{n-1} &= \begin{pmatrix} 1&1\\1&0 \end{pmatrix} \begin{pmatrix} F_n&F_{n-1}\\F_{n-1}&F_{n-2} \end{pmatrix} \\ &= \begin{pmatrix} F_n+F_{n-1}&F_{n-1}+F_{n-2}\\F_n&F_{n-1} \end{pmatrix} \\ &= \begin{pmatrix} F_{n+1}&F_n\\F_n&F_{n-1} \end{pmatrix} \end{align} \] as desired.

So the only thing left to do is to compute \(A^n.\) The characteristic polynomial is \((1-t)(-t)-1 = t^2-t-1,\) whose roots are \(\phi\) and \(\rho,\) where \(\phi\) is the golden ratio and \(\rho = \frac{1-\sqrt{5}}2\) is its conjugate.

The \(\phi\)-eigenspace is the nullspace of \(\begin{pmatrix} 1-\phi&1 \\ 1&-\phi \end{pmatrix},\) which is one-dimensional and spanned by \(\begin{pmatrix} \phi\\1 \end{pmatrix}.\) The same holds with all the \(\phi\)'s replaced by \(\rho\)'s. So the conclusion is that \( A = PDP^{-1},\) where \[ \begin{align} P &= \begin{pmatrix} \phi&\rho\\1&1 \end{pmatrix} \\ D &= \begin{pmatrix} \phi&0\\0&\rho \end{pmatrix} \\ P^{-1} &= \frac1{\sqrt{5}} \begin{pmatrix} 1&-\rho\\-1&\phi \end{pmatrix}. \end{align} \]

Putting this all together gives \[ \begin{align} A^n = (PDP^{-1})^n = PD^nP^{-1} &= \frac1{\sqrt{5}} \begin{pmatrix} \phi&\rho\\1&1 \end{pmatrix} \begin{pmatrix} \phi^n&0\\0&\rho^n \end{pmatrix} \begin{pmatrix} 1&-\rho\\-1&\phi \end{pmatrix} \\ &= \frac1{\sqrt{5}} \begin{pmatrix} \phi^{n+1} & \rho^{n+1} \\ \phi^n & \rho^n \end{pmatrix} \begin{pmatrix} 1&-\rho \\ -1&\phi \end{pmatrix} \\ &= \frac1{\sqrt{5}} \begin{pmatrix} \phi^{n+1}-\rho^{n+1} & * \\ \phi^n - \rho^n & * \end{pmatrix} \end{align} \] (we don't really care about the second column, although it's not much harder to compute).

In particular, the bottom left entry, which is \(F_n\) by induction, equals \[ \frac1{\sqrt{5}} (\phi^n-\rho^n) = \frac{(1+\sqrt{5})^n-(1-\sqrt{5})^n}{2^n\sqrt{5}}, \] which is Binet's formula for \(F_n.\) \(_\square\)

More applications to exponentiation and solving differential equations are in the wiki on matrix exponentiation.

Contents