Matrices

A matrix is a rectangular array of numbers, arranged in rows and columns. For instance, the left matrix has two rows and three columns, while the right matrix has three rows and two columns:

\[\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}, \begin{pmatrix}1&2\\3&4\\5&6\end{pmatrix}.\]

Matrices are useful in a variety of fields and form the basis for linear algebra. Their applications include solving systems of linear equations, path-finding in graph theory, and several applications in group theory (especially representation theory). They are also extremely useful in representing linear transformations (especially rotations) and thus complex numbers; for instance, the matrix

\[\begin{pmatrix}a&-b\\b&a\end{pmatrix}\]

represents the complex number \(a+bi\).

Formal Definition

A matrix is a rectangular array of any objects for which addition and multiplication are defined. Generally, these objects are numbers, but it is equally valid to have a matrix of symbols like

\[M = \begin{pmatrix} \clubsuit & \circ & \blacksquare \\ \text{\S} & \checkmark & \bigstar \end{pmatrix}\]

so long as there is a suitable understanding of what (for example) \(\checkmark \times \bigstar\) and \(\blacksquare + \clubsuit\) are. More formally speaking, a matrix's elements can be drawn from any field. However, it is generally best to consider matrices as collections of real numbers.

Generally, in a matrix, the vertical elements are termed as columns and the horizontal elements are termed as rows. The size of a matrix is measured in the number of rows and columns the matrix has. The above matrix, for instance, has 2 rows and 3 columns, and thus it is a \(2 \times 3\) matrix. Matrices that have the same number of rows as columns are called square matrices and are of particular interest.

The elements of a matrix are specified by the row and column they reside in. For example, the \(\checkmark\) in the above matrix \(M\) is at position \((2,2):\) the \(2^\text{nd}\) row and \(2^\text{nd}\) column. More explicitly, \(M_{2,2}=\checkmark\). This notation is especially convenient when the elements are related by some formula; for instance, the matrix

\[M = \begin{pmatrix}2&3&4\\3&4&5\\4&5&6\end{pmatrix}\]

can be more succinctly written as \(M_{i,j}=i+j\) for \(1 \leq i,j \leq 3\), or even more compactly as \(M=(i+j)_{3,3}\) \((\)where \(3,3\) denotes the size of the matrix\().\) The \(i^\text{th}\) row of the matrix can also be denoted by \(M_{i,*},\) and the \(j^\text{th}\) column by \(M_{*,j}.\)

In a given matrix of order \(m \times n,\) there are \(m\cdot n\) elements present. For example, in a 3 by 3 matrix the number of elements are \(3 \times 3 = 9,\) and in case of a 2 by 4 matrix there are \(2 \times 4 = 8\) elements present.

Finally, it is worth defining a matrix with exactly one column as a column vector, as they are especially useful in representing points in the \(n\)-dimensional plane.

Basic Operations

There are several simple operations on matrices and one somewhat complicated one (multiplication). The first is addition: matrix addition is defined only on two matrices of the same size and works by adding corresponding elements:

What is

\[\begin{pmatrix}2&0&5\\1&6&8\end{pmatrix} + \begin{pmatrix} 3 & 5 & 7 \\ 1 & 0 & 2\end{pmatrix}?\]

The matrices are added element-wise, so the result is

\[\begin{pmatrix} 2+3 & 0+5 & 5+7 \\ 1+1 & 6+0 & 8+2 \end{pmatrix} = \begin{pmatrix} 5 & 5 & 12 \\ 2 & 6 & 10 \end{pmatrix}. \ _\square\]

If \(A = \begin{pmatrix} 2 & 3 & 1 \\ 6 & -1 & 5 \end{pmatrix}\) and \(B = \begin{pmatrix} 1 & 2 & -1 \\ 0 & -1 & 3 \end{pmatrix},\) then find a matrix \(X\) such that \(A + B - 2X = 0\).

We have

\[\begin{align} A + B - 2X = 0 \implies X & = \dfrac{A + B}{2} \\ & = \dfrac12 \Bigg(\begin{pmatrix} 2 & 3 & 1 \\ 6 & -1 & 5 \end{pmatrix} + \begin{pmatrix} 1 & 2 & -1 \\ 0 & -1 & 3 \end{pmatrix}\Bigg) \\ & = \dfrac12 \begin{pmatrix} 3 & 5 & 0 \\ 6 & -2 & 8 \end{pmatrix} \\ & = \begin{pmatrix} \dfrac32 & \dfrac52 & 0 \\ 3 & -1 & 4 \end{pmatrix} .\ _\square \end{align}\]

More formally, we can state as follows:

The sum \(S\) of two matrices \(A,B\) of the same size satisfies the relation

\[S_{i,j} = A_{i,j} + B_{i,j}\]

for all \(i,j\) within the size of the matrices.

It is also possible to multiply matrices by scalars, i.e. single numbers, by multiplying element-wise:

What is

\[3.5\begin{pmatrix}2&0&5\\1&6&8\end{pmatrix}?\]

The elements are each multiplied by 3.5, so the result is

\[\begin{pmatrix} 3.5 \cdot 2 & 3.5 \cdot 0 & 3.5 \cdot 5 \\ 3.5 \cdot 1 & 3.5 \cdot 6 & 3.5 \cdot 8 \end{pmatrix} = \begin{pmatrix} 7 & 0 & 17.5 \\ 3.5 & 21 & 28 \end{pmatrix}.\ _\square\]

More formally, we can state as follows:

The product \(P\) of a constant \(c\) and matrix \(A\) satisfies the relation

\[P_{i,j} = c \cdot A_{i,j}\]

for all \(i,j\) within the size of the matrices.

Matrix Multiplication

Finally, there is the more complicated operation of matrix multiplication. The product of two matrices is defined only when the number of columns of the first matrix is the same as the number of rows of the second; in other words, it is only possible to multiply \(m \times n\) and \(n \times p\) size matrices. The reason for this becomes clear upon defining the product:

The product \(P\) of an \(m \times n\) matrix \(A\) and an \(n \times p\) matrix \(B\) satisfies

\[P_{i,j} = A_{i,*} \cdot B_{*,j}\]

for all \(i,j\) within the size of the matrices.

Here \(A_{i,*}\) denotes the \(i^\text{th}\) row of \(A\), which is a vector, and \(B_{*,j}\) denotes the \(j^\text{th}\) column of \(B\), which is also a vector. Thus, the dot \((\cdot)\) in this sense refers to multiplying vectors, defined by the dot product. Note that \(i\) and \(j\) are defined on \(1 \leq i \leq m\) and \(1 \leq j \leq p\), so the product \(P\) will be an \(m \times p\) matrix.

This rule seems rather arbitrary, so it is best illustrated by an example:

What is

\[\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}\begin{pmatrix}1&2\\3&4\\5&6\end{pmatrix}?\]

Firstly, note that the first matrix is \(2 \times 3\) and the second is \(3 \times 2\), so their product is indeed defined and will be a \(2 \times 2\) matrix. Firstly consider the \(1,1\) element of the product:

\[\begin{pmatrix}\color{green}{P_{1,1}} & P_{1,2}\\P_{2,1}&P_{2,2}\end{pmatrix} = \begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}\begin{pmatrix}1&2\\3&4\\5&6\end{pmatrix}.\]

It is equal to the dot product of the \(1^\text{st}\) row of the first matrix and the \(1^\text{st}\) column of the second matrix, i.e.

\[\begin{align} \begin{pmatrix}\color{green}{P_{1,1}} & P_{1,2}\\P_{2,1}&P_{2,2}\end{pmatrix} &= \begin{pmatrix}\color{red}{1}&\color{red}{2}&\color{red}{3}\\4&5&6\end{pmatrix}\begin{pmatrix}\color{blue}{1}&2\\\color{blue}{3}&4\\\color{blue}{5}&6\end{pmatrix}\\\\ \color{green}{P_{1,1}} &= (\color{red}{1, 2, 3}\color{black}{) \cdot (}\color{blue}{1, 3, 5}\color{black}{)} \\ &= \color{red}{1~}\color{black}{\cdot}\color{blue}{~1~}\color{black}{+}\color{red}{~2~}\color{black}{\cdot}\color{blue}{~3~}\color{black}{+}\color{red}{~3~}\color{black}{\cdot}\color{blue}{~5}\\ &=22. \end{align}\]

So the top left entry of the result is 22. The rest of the matrix can be filled out in the same way; for instance,

\[\begin{pmatrix}P_{1,1} & \color{green}{P_{1,2}}\\P_{2,1}&P_{2,2}\end{pmatrix} = \begin{pmatrix}\color{red}{1}&\color{red}{2}&\color{red}{3}\\4&5&6\end{pmatrix}\begin{pmatrix}1&\color{blue}2\\3&\color{blue}{4}\\5&\color{blue}{6}\end{pmatrix}.\]

The final result is

\[P = \begin{pmatrix}22&28\\49&64\end{pmatrix}.\ _\square\]

If \(A = \begin{pmatrix} 1 & -2 & 3 \\ 2 & 3 & -1 \\ -3 & 1 & 2 \end{pmatrix}\) and \(B = \begin{pmatrix} 1 & 0 & 2 \\ 0 & 1 & 2 \\ 1 & 2 & 0 \end{pmatrix},\) then find \(AB\) and \(BA\). What can you conclude from the final two matrices?

Both \(A\) and \(B\) are square matrices of order \(3 \times 3\). Hence both \(AB\) and \(BA\) are well-defined and are matrices of the same order \(3 \times 3\).

\[\begin{align} AB & = \begin{pmatrix} 1 & -2 & 3 \\ 2 & 3 & -1 \\ -3 & 1 & 2 \end{pmatrix} \cdot \begin{pmatrix} 1 & 0 & 2 \\ 0 & 1 & 2 \\ 1 & 2 & 0 \end{pmatrix} \\ & = \begin{pmatrix} 1 \cdot 1 + (-2) \cdot 0 + 3 \cdot 1 & 1 \cdot 0 + (-2) \cdot 1 + 3 \cdot 2 & 1 \cdot 2 + (-2) \cdot 2 + 3 \cdot 0 \\ 2 \cdot 1 + 3 \cdot 0 + (-1) \cdot 1 & 2 \cdot 0 + 3 \cdot 1 + (-1) \cdot 2 & 2 \cdot 2 + 3 \cdot 2 + (-1) \cdot 0 \\ (-3) \cdot 1 + 1 \cdot 0 + 2 \cdot 1 & -3 \cdot 0 + 1 \cdot 1 + 2 \cdot 2 & -3 \cdot 2 + 1 \cdot 2 + 2 \cdot 0 \end{pmatrix} \\ & = \begin{pmatrix} 4 & 4 & -2 \\ 1 & 1 & 10 \\ -1 & 5 & -4 \end{pmatrix} \\\\ BA & = \begin{pmatrix} 1 & 0 & 2 \\ 0 & 1 & 2 \\ 1 & 2 & 0 \end{pmatrix} \cdot \begin{pmatrix} 1 & -2 & 3 \\ 2 & 3 & -1 \\ -3 & 1 & 2 \end{pmatrix} \\ & = \begin{pmatrix} -5 & 0 & 7 \\ -4 & 5 & 3 \\ 5 & 4 & 1 \end{pmatrix}. \end{align}\]

Clearly, you can see that \(AB \neq BA\). Thus, we can conclude that multiplication of matrices need not be commutative. \(_\square\)

It is still admittedly unclear why matrix multiplication is defined this way. One major reason is in the use of systems of linear equations. The coefficients of each equation can be assembled into a coefficient matrix, and the variables can be arranged into a column vector. The product of the coefficient matrix and the column vector will itself be a column vector, the values of each equation. For example, the system of equations

\[\left\lbrace \begin{align} x + 2y + 3z &= 9 \\ 3x + y + 4z &= 12 \\ 2x + 4y - z &= 4 \end{align} \right. \]

can be more succinctly written in the form

\[\begin{pmatrix}1&2&3\\3&1&4\\2&4&-1\end{pmatrix}\begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}9\\12\\4\end{pmatrix}.\]

This is a very useful transformation besides the saving of space; in particular, if it were possible to "divide" matrices, it would be easy to find out what \(x,y,z\) are by dividing out the coefficient matrix. Unfortunately, division takes some more effort to define, so further explanation of this is left to a later section.

As a warning about matrix multiplication, it is extremely important to understand the following:

Matrix multiplication is not commutative. In other words, it is not generally true that \(AB = BA\).

The simplest way to see this is that matrix multiplication is defined only on \(m \times n\) and \(n \times p\) matrices; reversing their order gives the product of an \(n \times p\) matrix and an \(m \times n\) matrix, and \(p\) is not necessarily equal to \(m\). Even in such a case (e.g. square matrices), the multiplication is generally not commutative. Matrices \(A,B\) that do indeed satisfy \(AB = BA\) are (appropriately) called commuting matrices.

Do the two matrices \(\begin{pmatrix}1&1\\0&0\end{pmatrix}\) and \(\begin{pmatrix}2&0\\1&1\end{pmatrix}\) commute?

No, since

\[\begin{pmatrix}1&1\\0&0\end{pmatrix}\begin{pmatrix}2&0\\1&1\end{pmatrix} = \begin{pmatrix}3&1\\0&0\end{pmatrix}\]

but

\[\begin{pmatrix}2&0\\1&1\end{pmatrix}\begin{pmatrix}1&1\\0&0\end{pmatrix} = \begin{pmatrix}2&2\\1&1\end{pmatrix}.\ _\square\]

Finally, it is worth noting a special matrix: the identity matrix

\[I_n = \begin{pmatrix}1&0&0&\ldots&0\\0&1&0&\ldots&0\\0&0&1&\ldots&0\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\ldots&1\end{pmatrix},\]

which is an \(n \times n\) matrix that is zero everywhere except for the main diagonal, which contains all 1s. For instance,

\[I_2 = \begin{pmatrix}1&0\\0&1\end{pmatrix}, \qquad I_3 = \begin{pmatrix}1 & 0 & 0 \\0 & 1 & 0 \\ 0 & 0 & 1\end{pmatrix}.\]

It satisfies the property that

\[IA = AI = A\]

for any \(n \times n\) matrix \(A\). The reason should be clear from the above definitions.

Transpose and Determinant

Two useful functions on matrices are the transpose and the determinant. The transpose of an \(m \times n\) matrix \(A\) is an \(n \times m\) matrix \(A^T\) such that the rows of \(A\) are the columns of \(A^T\), and the columns of \(A\) are the rows of \(A^T\). For instance,

\[\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}^T = \begin{pmatrix}1&4\\2&5\\3&6\end{pmatrix}.\]

The transpose satisfies a few useful properties:

\((A+B)^T = A^T + B^T\)
\((AB)^T = B^TA^T\)
\((A^T)^T = A.\)

The second of these is the most useful since it (roughly) means that properties true of left multiplication hold true for right multiplication as well.

More interesting is the determinant of a matrix. There are several equally valid definitions of the determinant, though all would seem arbitrary at this point without an understanding of what the determinant is supposed to compute.

Formally, the determinant is a function \(\text{det}\) from the set of square matrices to the set of real numbers that satisfies 3 important properties:

\(\text{det}(I) = 1;\)

\(\text{det}\) is linear in the rows of the matrix;

if two rows of a matrix \(M\) are equal, \(\det(M)=0.\)

The second condition is by far the most important. It means that any of the rows of the matrix is written as a linear combination of two other vectors, and the determinant can be calculated by "splitting" that row. For instance, in the below example, the second row \((0,2,3)\) can be written as \(2 \cdot (0,1,0) + 3 \cdot (0,0,1)\), so

\[\text{det}\begin{pmatrix}1&0&0\\0&2&3\\0&0&1\end{pmatrix} = 2 \cdot \text{det}\begin{pmatrix}1&0&0\\0&1&0\\0&0&1\end{pmatrix}+3 \cdot \text{det}\begin{pmatrix}1&0&0\\0&0&1\\0&0&1\end{pmatrix}=2.\]

A key theorem shows this:

There is exactly one function satisfying the above 3 relations.

Unfortunately, this is very difficult to work with for all but the simplest matrices, so an alternate definition is better to use. There are two major ones: determinant by minors and determinant by permutations.

The first of the two, determinant by minors, uses recursion. The base case is simple: the determinant of a \(1 \times 1\) matrix with element \(a\) is simply \(a\). Note that this agrees with the conditions above, since

\[\text{det}\begin{pmatrix}a\end{pmatrix} = a \cdot \text{det}\begin{pmatrix}1\end{pmatrix}=a\]

because \(\text{det}\begin{pmatrix}1\end{pmatrix} = I\). The recursive step is as follows: denote by \(A_{ij}\) the matrix formed by deleting the \(i^\text{th}\) row and \(j^\text{th}\) column. For instance,

\[A = \begin{pmatrix}1&2&3\\4&5&6\\7&8&9\end{pmatrix} \implies A_{11} = \begin{pmatrix}5&6\\8&9\end{pmatrix}.\]

Then the determinant is given as follows:

The determinant of an \(n \times n\) matrix \(A\) is

\[\text{det}(A) = \sum_{i=1}^n (-1)^{i+1}a_{1,i}\text{det}(A_{1i}) = a_{1,1}\text{det}A_{11}-a_{1,2}\text{det}A_{12}+\cdots.\]

For example,

What is the determinant of \(\begin{pmatrix}a&b\\c&d\end{pmatrix}?\)

We write

\[\text{det}\begin{pmatrix}a&b\\c&d\end{pmatrix} = a ~\text{det}\begin{pmatrix}d\end{pmatrix} - b ~\text{det}\begin{pmatrix}c\end{pmatrix} = ad-bc.\ _\square\]

Unfortunately, these calculations can get quite tedious; already for \(3 \times 3\) matrices, the formula is too long to memorize in practice.

An alternate definition uses permutations. Let \(\sigma\) be a permutation of \(\{1, 2, 3, \ldots, n\}\), and \(S\) the set of those permutations.

Then the determinant of an \(n \times n\) matrix \(A\) is

\[\sum_{\sigma \in S}\left(\text{sgn}(\sigma)\prod_{i=1}^{n}a_{i,\sigma(i)}\right).\]

This may look more intimidating than the previous formula, but in fact it is more intuitive. It essentially says the following:

Choose \(n\) elements of \(A\) such that no two are in the same row and no two are in the same column, and multiply them, possibly also by \(-1\) if the permutation has an odd sign. The determinant is the sum over all choices of these \(n\) elements.

This definition is especially useful when the matrix contains many zeros, as then most of the products vanish.

Here is an example:

What is the determinant of \(\begin{pmatrix}a&b\\c&d\end{pmatrix}?\)

There are two permutations of \(\{1,2\}:\) \(\{1,2\}\) itself and \(\{2,1\}\). The first has a positive sign (as it has 0 transpositions) and the second has a negative sign (as it has 1 transposition), so the determinant is

\[\text{det}(A) = \sum_{\sigma \in S}\left(\text{sgn}(\sigma)\prod_{i=1}^{n}a_{i,\sigma(i)}\right) = 1 \cdot a_{1,1}a_{2,2} + (-1) \cdot a_{1,2}a_{2,1} = ad-bc.\]

Unsurprisingly, this is the same result as above. \(_\square\)

The determinant is a very important function because it satisfies a number of additional properties that can be derived from the 3 conditions stated above. They are as follows:

Multiplicativity: \(\text{det}(AB)=\text{det}(A)\text{det}(B).\)
Invariance under row operations: If \(A'\) is a matrix formed by adding a multiple of any row to another row, then \(\text{det}(A)=\text{det}(A').\)
Invariance under transpose: \(\text{det}(A) = \text{det}(A^T).\)
Sign change under row swap: If \(A'\) is a matrix formed by swapping the positions of two rows, then \(\text{det}(A')=-\text{det}(A).\)

As the next section shows, the multiplicative property is of special importance.

Inverting Matrices

At the end of the matrix multiplication section, it was noted that "dividing" matrices would be an extremely useful operation. To attempt to create one, it is important to understand the definition of division in numbers:

Dividing by a number \(c\) is equivalent to multiplying by a number \(\frac{1}{c}.\)

In other words, dividing by \(c\) is equivalent to multiplying by a number \(d\) such that \(cd = 1\). This makes sense for what division "should" do: dividing by \(c\) followed by multiplying by \(c\) should be the equivalent of doing nothing, i.e. multiplying by 1. The above definition ensures this.

Matrix "division," should it exist, should follow the same principle: multiplying by a matrix and then dividing by it should be the equivalent of doing nothing. In matrix multiplication, however, the equivalent of doing nothing is multiplying by \(I\).

This leads to a natural definition:

The inverse of a matrix \(A\) is a matrix \(A^{-1}\) such that \(AA^{-1}=A^{-1}A=I\).

A natural question to ask is whether all matrices have inverses. Unfortunately, the answer is no, but this should not be surprising: not all numbers have inverses either (it is impossible to divide by 0). Indeed, the multiplicative property of the determinant from the previous section shows this: since \(AA^{-1} = I\),

\[\begin{align} \text{det}\big(AA^{-1}\big) &= \text{det}(I)\\ \text{det}(A)\text{det}\big(A^{-1}\big) &= \text{det}(I) \\ &= 1. \end{align}\]

So it is necessary for \(\text{det}(A)\) to be nonzero. It is somewhat more difficult for this to be a sufficient condition, but this is indeed the case:

A matrix has an inverse if and only if it has a nonzero determinant.

It is worth remembering the formula in the \(2 \times 2\) case:

The inverse of the matrix \(\begin{pmatrix}a&b\\c&d\end{pmatrix}\), if it exists, is

\[\frac{1}{ad-bc}\begin{pmatrix}d&-b\\-c&a\end{pmatrix}.\]

This isn't too difficult to verify.

In general, the inverse matrix can be found by analyzing the cofactor matrix, an \(n \times n\) matrix \(\text{cof}(A)\)satisfying

\[\text{cof}(A)_{i,j} = (-1)^{i+j}\text{det}A_{ji},\]

where \(A_{ji}\) refers to the matrix formed by removing the \(j^\text{th}\) row and \(i^\text{th}\) column from \(A\). This matrix satisfies the property that

\[\text{cof}(A)A = A~\text{cof}(A) = \text{det}(A)I.\]

This provides yet another reason that \(A\) is invertible if and only if it has nonzero determinant. It is also worth noting that the \(2 \times 2\) cofactor matrix is \(\begin{pmatrix}d & -b\\-c & a \end{pmatrix}\), which aligns with the formula above.

Solving Systems of Linear Equations

See full article here: Solving Linear Systems Using Matrices.

The above sections provide a general method for solving systems of linear equations:

Arrange the coefficients in the coefficient matrix \(A\).
Arrange the variables in a vector \(\mathbf{v}\).
Arrange the resulting values into another vector \(\mathbf{b}\). The goal is now to solve the equation \(A\mathbf{v} = \mathbf{b}.\)
Calculate \(A^{-1}\), the inverse of \(A\), for example, by the cofactor method from the previous section.
Multiply both sides of the above equation by \(A^{-1}\) on the left. Then \(\mathbf{v}=A^{-1}\mathbf{b}\), which is a simple matrix multiplication.

There is one potential pitfall: the inverse of \(A\) does not exist. This means that the determinant of \(A\) is 0, so there are two rows of \(A\) that are multiples of one another; this means that the original system of equations had two equations that were multiples of one another. This means that there are either no or infinite solutions.

Contents