Spectral Theorem
In linear algebra, one is often interested in the canonical forms of a linear transformation. Given a particularly nice basis for the vector spaces in which one is working, the matrix of a linear transformation may also be particularly nice, revealing some information about how the transformation operates on the vector space.
The spectral theorem provides a sufficient criterion for the existence of a particular canonical form. Specifically, the spectral theorem states that if \(M\) equals the transpose of \(M\), then \(M\) is diagonalizable: there exists an invertible matrix \(C\) such that \(C^{-1} MC \) is a diagonal matrix. Recall that a diagonal matrix is any matrix for which all entries off the main diagonal (the diagonal from top left to bottom right) are zero.
A matrix \(M\) with entries in \(\mathbb{R}\) is called symmetric if \(M =M^{T}\). The spectral theorem states that any symmetric matrix is diagonalizable.
Contents
Motivation
Consider the matrix
\[A= \begin{pmatrix} 2 & 6 \\ 0 & -1 \end{pmatrix}.\]
The eigenvalues of this matrix are \(\lambda_1, \lambda_2 = 2, -1\), with corresponding eigenvectors \(v_1 = (1,0)\) and \(v_2 = (-2,1)\). These eigenvectors \(\{v_1, v_2\}\) form a basis for \(\mathbb{R}^2\), with change of basis matrix
\[C= \begin{pmatrix} 1 & -2 \\ 0 & 1 \end{pmatrix},\]
sending the standard basis vector \(e_1 = (1,0)\) to \(v_1\) and \(e_2 = (0,1)\) to \(v_2\). In the basis \(\{v_1, v_2\}\), the matrix for the linear transformation described by \(A\) is precisely
\[C^{-1} AC = \begin{pmatrix} 2 & 0 \\ 0 & -1 \end{pmatrix}.\]
This diagonal matrix is easier to parse than \(A\); looking at it immediately tells one that \(A\) acts on \(\mathbb{R}^2\) by scaling two axes, one by a factor of \(2\) and one by a factor of \(-1\) (i.e. reflection).
A matrix \(M\) is called diagonalizable if there is a basis in which the linear transformation described by \(M\) has a diagonal matrix, i.e. a matrix whose entries off the main diagonal (the diagonal from top left to bottom right) are all zero. Equivalently, \(M\) is diagonalizable if and only if there exists an invertible matrix \(C\) such that \(C^{-1}M C\) is a diagonal matrix. Diagonalizable matrices are easier to work with than non-diagonalizable matrices since they can be placed in this canonical diagonal form with a change of basis.
Proof of Spectral Theorem
Let's start off by proving the following useful lemma:
Lemma: All the eigenvalues of a real \(n \times n \) symmetric matrix \(M\) are real.
Proof:
Let \(\lambda \in \mathbb{C} \) be an eigenvalue of \(M\) with corresponding eigenvector \(\ v \in \mathbb{C^n} \). Now I will show that \(\ \overline{\lambda} = \lambda \) by evaluating \(\ (Mv)^{T} \overline{v} \) in two ways:
\[\begin{align}
\ (Mv)^{T} \overline{v}
&= v^{T}M^{T}\overline{v} \\
&= v^{T}(M\overline{v}) \\
&= v^{T}\big(\overline{M}\overline{v}\big) \\
&= v^{T}\big(\overline{Mv}\big) \\
&= v^{T}\big(\overline{\lambda v}\big) \\
&= \overline{\lambda}v^{T}\overline{v} &\qquad (1) \\\\
(Mv)^{T} \overline{v}
&= \lambda v^{T} \overline{v}. &\qquad (2)
\end{align} \]
On comparing equations (1) and (2), it's clear that \(\ \overline{\lambda} = \lambda,\) so \(\lambda \in \mathbb{R}.\ _\square \)
Proof by Induction
For \(n = 1,\) \(M\) and \(v\) are scalars, so \( Mv = \lambda v, \) where \( M = \lambda. \) Thus, we can pick any non-zero real scalar \(v\) to form a basis for \(\mathbb{R}. \)
Induction Hypothesis:
Every \(\ k \times k \) symmetric matrix is diagonalisable for \(k = 1,...,n-1.\) So, let \(M \in M_{n} (\mathbb{R}) \) be symmetric. We break the induction part into 3 steps:
Step 1:
We can choose eigenvalue \(\lambda_{1} \in \mathbb{R} \) by the above lemma (see comment for details), and let's normalize a corresponding eigenvector such that \(\ |v_{1}| = 1 \). Now we can extend to a basis \(\ u_{1}, u_{2}, ... , u_{n} \) for \(\mathbb{R^n} \) and apply the Gram-Schmidt procedure to obtain an orthonormal basis \(\ B = {v_{1}, v_{2}, ... , v_{n}} \) for \(\mathbb{R^n} \).Step 2:
Define \(P= [v_{1} v_{2} \ldots v_{n}] \). Now the columns of \(P\) are orthonormal, so \(P\) is an orthogonal matrix. That is, \( P^{T} = P^{-1}\). Also, define \(\ A= P^{-1}MP = P^{T}MP, \) then \[ A^{T} = \left(\big(P^{T}M\big)P\right)^{T} = P^{T}\big(P^{T}M\big)^{T} = P^{T}M^{T}P = P^{T}MP = A, \] so \(A\) is symmetric. Now we use this to show how \(A\) can be written in block matrix form. So, pre-multiplying the standard basis vector \(\mathbf{e_{1}} \) by \(A\) gives \[\ P^{T}AP\mathbf{e_{1}} = P^{T}A v_{1} = \lambda_{1}P^{T}v_{1} = \lambda_{1} [v_{1}]_{B} = \lambda_{1} \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}_{B}. \] So, \(A\) has the matrix block form \[\ \begin{bmatrix} \lambda_{1} \quad 0 \\ 0 \quad C \end{bmatrix}, \] where \( C \in M_{n-1}(\mathbb{R}) \) is symmetric.Step 3:
Now, by the induction hypothesis, there exists a diagonal matrix \(D\) and orthogonal matrix \(Q\) such that \( D= Q^{-1} CQ = Q^{T} CQ \). Now define \[ R = P \begin{bmatrix} 1 \quad 0 \\ 0 \quad Q \end{bmatrix}. \] We aim to show that \(R\) is orthogonal and \(\ R^{T} MR \) is diagonal. For the first part, \[ \begin{bmatrix} 1 \quad 0 \\ 0 \quad Q \end{bmatrix}^{-1} = \begin{bmatrix}1 \quad 0 \\ 0 \quad Q^{-1} \end{bmatrix} \implies R^{-1} = \begin{bmatrix}1 \quad 0 \\ 0 \quad Q^{-1} \end{bmatrix} P^{-1} = \begin{bmatrix}1 \quad 0 \\ 0 \quad Q^{T} \end{bmatrix} P^{T} = R^{T}. \] Finally, \[ R^{T}MR = \begin{bmatrix} 1 \quad 0 \\ 0 \quad Q^{T} \end{bmatrix} P^{T}MP \begin{bmatrix}1 \quad 0 \\ 0 \quad Q \end{bmatrix} = \begin{bmatrix} \lambda_{1} \quad 0 \\ 0 \quad D \end{bmatrix}. \] So there exists an orthogonal matrix \(R\) such that \(\ R^{T}MR \) is diagonal, which is equivalent to saying that there exists an orthonormal basis for \(\mathbb{R^n} \) consisting of eigenvalues of \(M.\ _\square \)(Note: The proof is yet to be peer-reviewed, so it may contain errors.)