Linear Algebra
Linear algebra is an area of study in mathematics that concerns itself primarily with the study of vector spaces and the linear transformations between them.
Linear algebra initially emerged as a method for solving systems of linear equations. Problems like the following show up throughout all forms of mathematics, science, and engineering, giving linear algebra a very broad spectrum of use:
Solve the following system of linear equations for \(x\), \(y\), and \(z\):
\[\begin{array}{r c r c r c r l} x & + & 2y & + & z & = & 12 \\ 2x & - & y & + & z & = & 1 \\ x & + & y & - & 3z & = & -4 &. \\ \end{array}\]
While there are many ways to solve these types of systems, one of special interest is by treating the three lines as a linear transformation and looking at its corresponding matrix:
\[ \begin{array}{r c r c r c r} x & + & 2y & + & z & = & 12 \\ 2x & - & y & + & z & = & 1 \\ x & + & y & - & 3z & = & -4 \\ \end{array} \implies
\begin{pmatrix} 1 & 2 & 1\\ 2 & -1 & 1 \\ 1 & 1 & -3 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} 12 \\ 1 \\ -4 \end{pmatrix}. \]Then, \(\begin{pmatrix} x \\ y \\ z \end{pmatrix}\) is an element of the vector space \(\mathbb{R}^3\), and the matrix describes a linear transformation from \(\mathbb{R}^3\) to itself. Finding the matrix's inverse then yields the answer:
\[\begin{align} \begin{pmatrix} \frac{2}{19} & \frac{7}{19} & \frac{3}{19} \\ \frac{7}{19} & -\frac{4}{19} & \frac{1}{19} \\ \frac{3}{19} & \frac{1}{19} & -\frac{5}{19} \end{pmatrix} \cdot \begin{pmatrix} 1 & 2 & 1\\ 2 & -1 & 1 \\ 1 & 1 & -3 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} & = \begin{pmatrix} \frac{2}{19} & \frac{7}{19} & \frac{3}{19} \\ \frac{7}{19} & -\frac{4}{19} & \frac{1}{19} \\ \frac{3}{19} & \frac{1}{19} & -\frac{5}{19} \end{pmatrix} \cdot \begin{pmatrix} 12 \\ 1 \\ -4 \end{pmatrix}\\\\ \Rightarrow \begin{pmatrix} x \\ y \\ z \end{pmatrix} &= \begin{pmatrix} 1 \\ 4 \\ 3 \end{pmatrix}.\ _\square \end{align}\]
However, linear algebra extends beyond just manipulating equations. Linear equations are commonly used to "linearize" other mathematical objects by reducing or expanding the object into a space with linear properties.
Linear algebra is used to solve systems of equations, create linear models, and approximate solutions to equations. Many areas of science and engineering require numerical methods, and linear algebra provides the vast majority of these numerical tools. One could even argue there are no technical subjects that do not benefit from linear algebra. Some fields where linear algebra is especially crucial are data science, structural engineering, machine learning, image processing, linear programming, fluid mechanics, control theory, network flow, and engineering design. In mathematics, linear algebraic methods are used throughout all areas of algebra and many areas of analysis, with far-reaching examples across the theory of modules, representation theory, Fourier analysis, and functional analysis.
Contents
The Idea of Linearity
Before the advent of computers, one of the principal roles of scientists and applied mathematicians was to find better approximations for expressions like \(\sin x\). Finding the output of \(\sin x\) with, for instance, \(x = 7295\) would require a trip to the library to consult a table of values and would not be calculated out to as many digits as is done today. Students were taught many tricks and algorithms to evaluate non-linear functions like \(\sqrt{x}\), since there was no tool (calculator or computer) to do it for them.
Today many students learn to perform elementary arithmetic operations. It is reasonable that today's student could do anything a common calculator can do. However, students are less likely to be able to calculate operations outside basic ones like addition and multiplication: for instance, calculating \(\sqrt{x}\) or values of \(\sin x\). The point of this story, then, is twofold:
- Students are now taught to think linearly (i.e., primarily in terms of the basic arithmetic operations), and because of this, they are much more prepared to study linear algebra than if, say, the operation \(\sin x\) was given as much attention as \(x + y\). Addition is treated as more important than the sine function.
- These linear tools are all that is necessary. Calculators and computers use methods deriving from linear algebra in order to approximate values of \(\sqrt{x}\) or \(\sin x\). Computers think linearly too!
The vector space defined by several vectors is the set of everything one could get by performing the usual operations of arithmetic on them (here, this means scalar multiplication and vector addition). Intuitively, this fits hand in hand with the picture of a student combining numbers (or vectors) in any way they can. In so doing, the student will never leave the vector space (this same intuition works for fields).
A common method for calculating (or approximating) \(\sin x\) comes from the Gram-Schmidt process. It provides a method for answering questions like this: find the quintic polynomial \(p(x)\) that best approximates \(\sin x\) in the region \([\pi, \, 2\pi]\) in terms of minimizing the "least squares" error.
The linear algebraist dissects this example by realizing that the vector space in question is the space of all real-valued polynomials of degree at most \(5\), and it is equipped with the \(L^2\) norm. The question then becomes a direct application of the Gram-Schmidt process.
Vector Spaces
Main Article: Vector Spaces
A vector space can be thought of as a "coordinate system," most often over the real numbers. Some linear algebraic methods are only ever used on the Cartesian coordinate space \(\mathbb{R}^n\). A vector space is a composition of multiple copies of a field, with component-wise addition and no concept of "vector multiplication." A vector space, abstractly, is a collection of sums of multiples of elements that cannot be combined or reduced, known as a basis.
A vector space provides a system in which "coordinate mathematics" can work. It also encapsulates the idea of "linearity.” That is, the modern definition and construction of vector spaces are the result of centuries of progress in mathematical thought, aimed at solving systems of equations in a simple, scalable manner.
Fields are important to vector spaces. Roughly speaking, a field is a number system. It is similar to the real numbers, in that the order of operations is similar, that reciprocals exist, and that inverses exist.
Formally, a field is a set \(\mathbb{F}\) together with addition \(+\) and multiplication \(\times\) operations satisfying the following:
- Associativity. For all \(a, b, c \in \mathbb{F}\), the following hold: \[\begin{align*} a + (b + c) &= (a + b) + c \\ a \times (b \times c) &= (a \times b) \times c. \end{align*}\]
Commutativity. For all \(a, b \in \mathbb{F}\), the following hold: \[\begin{align*} a + b &= b + a \\ a \times b &= b \times a. \end{align*}\]
Identities. There exists an element \(0 \in \mathbb{F}\) such that for all \(a \in \mathbb{F}\), \(a + 0 = a\). There exists an element \(1 \in \mathbb{F}\) such that for all \(b \in \mathbb{F}\), \(1 \times b = b\).
- Inverses. (Careful!) For all \(a \in \mathbb{F}\), there is an element \(b_1 \in \mathbb{F}\) such that \(a + b_1 = 0\). For all nonzero \(a \in \mathbb{F}\), there is an element \(b_2 \in \mathbb{F}\) such that \(a \times b_2 = 1\). \((\)Conventionally, \(b_1\) is referred to as \(-a\) and \(b_2\) is referred to as \(a^{-1}.)\)
- Distributivity. For all \(a, b, c \in \mathbb{F}\), the following holds: \[ a \times (b + c) = (a \times b) + (a \times c). \]
Lots of intuition from working with the real numbers carries over to other fields. For instance, two numbers \(x\) and \(y\) multiply to \(xy = 0\) if and only if, either \(x = 0\), \(y = 0\), or both \(=0\). However, the following "common sense" statement applies to the real numbers but not to all fields. (Specifically, it applies to all and only fields with characteristic \(0\).)
Let \(x\) be an element of the field. There is no value \(n\) such that \[\underbrace{x + x + \dots + x}_{n \text{ times}} = n \cdot x = 0.\]
In particular, there are finite fields with \(p\) elements for any prime \(p\) that satisfy the same rules as modular arithmetic. (In fact, there are finite fields with \(p^k\) elements for any positive integer \(k\), but those are a bit harder to construct.) Then, of course, \(x = 1\) and \(n = p\) yields \(1 + 1 + \dots + 1 = p \cdot 1 = 0 \cdot 1 = 0\). Note that any number \(k\) is equal to \(k \equiv - (p - k) \pmod p\), so this is also the same as adding a number to its negative.
Analogously, lots of intuition from working with a Cartesian coordinate system carries over to general vector spaces.
Formally, a vector space \(V\) is a set together with a field \(\mathbb{F}\) and two operations, known as vector addition and scalar multiplication:
\[\begin{align*} \text{Vector Addition } + \text{ :}& (v, \, w) \mapsto v + w \\ \text{Scalar Multiplication } \cdot \text{ :}& (\alpha, \, v) \mapsto \alpha \cdot v, \end{align*}\]
where \(v\) and \(w\) are vectors and \(\alpha\) is an element of the field \(\mathbb{F}\). These operations satisfy the following rules:
- Additive associativity. For all \(u, v, w \in V\), \(u + (v + w) = (u + v) + w.\)
- Additive commutativity. For all \(v, w \in V\), \(v + w = w + v.\)
- Additive identity. There exists an element \(0 \in V\) such that \(v + 0 = v\) for all \(v \in V.\)
- Additive inverse. For all \(v \in V\), there exists an element \(w \in V\) such that \(v + w = 0.\)
- Associativity of scalar multiplication. For all \(a, b \in \mathbb{F}\) and \(v \in V\), \(a \cdot (b \cdot v) = (a \times b) \cdot v.\)
- Identity of scalar multiplication. For all \(v \in V\), \(1 v = v\), where \(1\) is the multiplicative identity of \(\mathbb{F}.\)
- Distributivity over vector addition. For all \(a \in \mathbb{F}\) and \(v, w \in V\), \(a \cdot (v + w) = a \cdot v + a \cdot w.\)
- Distributivity over field addition. For all \(a, b \in \mathbb{F}\) and \(v \in V\), \((a + b) \cdot v = a \cdot v + b \cdot v.\)
A linear combination of vectors \(v_1, \, v_2, \, \dots, \, v_k\) in some vector space \(V\) over field \(\mathbb{F}\) is a sum \(a_1 v_1 + \dots + a_k v_k,\) where \(a_i \in \mathbb{F}\) for each \(i = 1, \, 2, \, \dots, \, k.\)
- Consider the space of all complex numbers of the form \(x + iy,\) where \(x\) and \(y\) are real numbers. Note, however, that there is no concept of "complex multiplication" in this space, since vectors cannot be multiplied, and in most senses, this space is the same as \(\mathbb{R}^2\), 2-D Cartesian coordinates. This explains complex numbers are normally represented as a plane.
- Consider the space of coordinates \((\alpha, \, \beta, \, \gamma)\) where \(\alpha\), \(\beta\), and \(\gamma\) are complex numbers. Equipped with componentwise addition and complex scalar multiplication, this is the vector space \(\mathbb{C}^3\).
- Consider the space of all expressions of the form \(a_1 x_{i_1} + a_2 x_{i_2} + \dots + a_n x_{i_n}\) for some natural number \(n\), natural numbers \(i_1, \, i_2, \, \dots, \, i_n\), and formal variables \(x_0, \, x_1, \, x_2, \, \dots\). Let \(a_1, \, a_2, \, \dots, \, a_n\) be elements of some field \(\mathbb{F}\). Then this is an infinite-dimensional vector space.
Bases
Main Article: Bases
Up to this point, the analogy between vector spaces and coordinate systems has not been fully explained. In a coordinate system, there is a dimension and an associated number of coordinates. In a vector space, there can be an analogous concept of dimension, but first two questions need to be asked:
- Do there exist vector spaces whose elements cannot be expressed in terms of coordinates?
- What exactly is meant by "coordinate"?
Which of the following are "bases" in the way described?
- \((2,\,3)\) and \((-1,\,1)\) generate \(\mathbb{R}^2\) over the real numbers.
- \(\dots,\,2^{-2},\,2^{-1},\,2^0,\,2^1,\,2^2,\,\dots\) generate real numbers by using their binary representation over \(\mathbb{F}_2\).
- \(1\) and \(\sqrt{2}\) generate the algebraic numbers over the rationals.
So there should only be one possible way to express any vector in terms of coordinates, and every vector should have a coordinate representation. Furthermore, the representations should be finite. In a vector space, sets of coordinates that satisfy these constraints must span the space and be linearly independent. Such a set is called a basis, and the plural of "basis" is "bases" (pron. bay-sees).
A set \(\mathbb{S}\) of elements in a vector space is known as linearly independent if there is no linear combination \(a_1 \cdot v_1 + a_2 \cdot v_2 + \dots + a_k \cdot v_k = 0,\) where \(v_i \in \mathbb{S}\), \(a_i\) is an element of the underlying field, and \(a_i \ne 0\) for all \(i\) \((k\) is a positive integer, necessarily finite\()\). Equivalently, a set of elements is linearly independent if and only if none of its elements can be expressed as a sum of multiples of the rest. Note that this immediately implies \(\textbf{0}\) cannot be an element of any linearly independent set.
A set \(\mathbb{S}\) of elements in a vector space spans the space (or, is known as a spanning set of the vector space) if any element \(v\) of the vector space can be expressed as \(v = a_1 \cdot v_1 + a_2 \cdot v_2 + \dots + a_k \cdot v_k\) for some vectors \(v_i \in \mathbb{S}\) and scalars \(a_i\) of the underlying field, for all \(i \le k\) \((k\) is a positive integer\()\).
A basis may be defined, equivalently, in any of the following ways:
- A basis is a set of vectors that is linearly independent and spans the space.
- A basis is a maximal linearly independent set. That is, no other element of the vector space can be added to the set to create a different linearly independent set.
- A basis is a minimal spanning set. That is, no element of the set can be removed to create another set whose elements span the vector space.
All nontrivial vector spaces have many bases, and every basis of a given vector space must have the same number of elements. For instance, \(\mathbb{R}^3\) has many bases: one is the normal one in terms of coordinates, given by \((1,\,0,\,0),\) \((0,\,1,\,0)\), and \((0,\,0,\,1)\), and another (in terms of the same coordinates) is \((1,\,2,\,1),\) \((2,\,-1,\,1),\) and \((1,\,1,\,-1)\). In general, a finite set of vectors is a basis if the matrix created by placing them (their coordinate representations) in rows is square and invertible.
A vector space \(V\) is said to have dimension \(k\) if it has a basis with \(k\) elements.
Since all bases in a vector space have the same number of elements, the concept of dimension is well-defined. (That is, a vector space only has one dimension.)
Linear Transformations
Main Article: Linear Transformations
In any area of mathematics, one of the crucial questions to ask about the central objects of study is how they interact with each other. Here, vector spaces are the central objects of study, and linear transformations are the mappings between them.
A linear transformation \(T\) from vector space \(V\) to vector space \(W\), both over a field \(\mathbb{F}\), is a function \(T:\, V \to W\) that satisfies for any \(u, v \in V\) and \(\alpha \in \mathbb{F}\),
- \(T(u + v) = T(u) + T(v),\)
- \(T(\alpha v) = \alpha T(v).\)
A common setting in which linear transformations arise is that of the coordinate plane \(\mathbb{R}^n\). A linear transformation \(T: \mathbb{R}^2 \to \mathbb{R}^2\) can be seen as a rotation and dilation/contraction of the plane. In general, linear transformations from a vector space \(V\) to itself are known as endomorphisms of \(V\).
A linear transformation \(T: \mathbb{R}^n \to \mathbb{R}^k\) must take the form of a \(k \times n\) matrix.
Consider the images of \((1,\,0,\,\dots,\,0),\) \((0,\,1,\,\dots,\,0),\) through \((0,\,0,\,\dots,\,1)\) under the linear transformation \(T\). Since each element of \(\mathbb{R}^n\) is expressible uniquely as a linear combination of those coordinates, the image of each element is likewise expressible as a linear combination of the image of those coordinates. Thus, the \(k \times n\) matrix \(M\) formed by placing together \(n\) column vectors of length \(k\) (corresponding to the images of the coordinates) satisfies for any column vector \(v\) in \(\mathbb{R}^n\),
\[T(v) = Mv.\ _\square\]
Uses in Mathematics
The methodology employed by linear algebra may be best described with the following reductive assessment:
- Reduce some aspect of the mathematical object to a collection of linear relationships.
- Be sure to specify the vector space in question: What is its underlying field? What is its dimension or one of its bases?
- Use theorems from linear algebra to answer questions about the object.
Such methodology is now widespread throughout different areas of mathematics, but much of this style of thinking can be traced to the origins of linear algebra. Much of its success should be attributed to the widespread use of linear relations throughout mathematics.
Here is a famous problem from combinatorics that can be solved by knowing that the rank of a matrix is no greater than the rank of one of its factors.
There are \(n\) citizens living in Oddtown. Their main occupation was forming various clubs, which at some point started threatening the very survival of the city. In order to limit the number of clubs, the city council decreed the following innocent-looking rules:
- Each club must have an odd number of members.
- Every two clubs must have an even number of members in common.
What is the maximum number of clubs Oddtown could possibly have?
On first reading the question, one might observe that (in some sense) there are \(2^n\) possible clubs, one for each possible group of citizens. However, this does not fully take into account the additional restrictions. In order to do that, it is helpful to restate the rules in a more mathematical manner.
There are \(n\) elements in a basis. Their main occupation was forming various vectors, which at some point started threatening the very survival of the vector space. In order to limit the number of vectors, the vector space council decreed the following rules:
- Each vector must have an odd magnitude squared (dot product with itself).
- Every two (distinct) vectors must have an even dot product.
Now that the problem has been restated mathematically, what does it actually mean? What is the underlying field, and what does it mean to be a basis? Since the rules have been restated in a way such that only the parity of a vector's elements is relevant, the problem is reduced to one of modular arithmetic, and the underlying field is \(\mathbb{Z} / 2\mathbb{Z} \cong \mathbb{F}_2\), the field of two elements.
Now, suppose there are \(m\) vectors (clubs) in Oddtown, and consider the \(m \times n\) matrix \(A\) made by placing the \(m\) vectors on top of each other. Then, \(A_{i,j} = 1\) if the \(i^\text{th}\) club contains the \(j^\text{th}\) citizen. Then, the rules imposed by the council appear when considering \(A A^T\): it is an \(m \times m\) matrix with \(1\)'s along the diagonal and nowhere else. Thus, from linear algebra, \(A A^T = \text{Id}_m\) has rank no larger than the minimum of the ranks of its factors, so \(m \le \text{rank}(A) = n\).
The maximum number of clubs Oddtown could possibly have is \(n\). \(_\square\)
Functional analysis is largely concerned with building a metric/topological structure on top of an infinite vector space. These vector spaces are generally called function spaces and are built upon sets such as \(\mathcal{C}\big((a, \, b), \, \mathbb{R}\big)\), the space of all continuous real-valued functions on some open interval \((a, \, b)\). This set can be seen as a vector space over the field \(\mathbb{R}\), and a norm can be defined on it to add topological properties. The core of functional analysis focuses on the so-called \(L^p\)-spaces, which deal with certain classes of integrable functions.