In most real-world cases, systems of linear equations won’t have a solution because they are overdetermined--i.e. there are more equations than variables. This happens most often in modeling, such as when one tries to find a relationship between two variables (such as height and weight). It is very rare that such variables would have a perfectly linear (or quadratic, or exponential, etc.) relationship, but there is often a “close”-to-linear relationship. This is especially important in machine learning, where finding a perfect relationship is often actually *undesirable* because it may not generalize well to new data.

Throughout this chapter, we’ll thus explore finding *approximate* solutions to the matrix equation \(Ax = b\).

**squared loss**
\[R(x) = \lvert\lvert Ax - b \rvert\rvert.\]
Let’s start with a small example. Suppose
\[A = \begin{pmatrix}1 & 1 \\ 1 & 2 \\ 1 & 3\end{pmatrix}, b = \begin{pmatrix}5 \\ 0 \\ 0\end{pmatrix}.\]
Obviously we can’t satisfy all three equations. If we tried to satisfy just the first two, we’d get \(x_1 = \begin{pmatrix}10\\-5\end{pmatrix}\). Alternatively, we could try \(x_2 = \begin{pmatrix}7 \\ -3\end{pmatrix}\). What are \(R(x_1)\) and \(R(x_2)?\)

Remember, we're trying to solve the system \[A^TA\begin{pmatrix}c_1\\c_2\end{pmatrix} = A^Tb.\] Notice that we got the system \[\begin{pmatrix}3&6\\6&14\end{pmatrix}\begin{pmatrix}c_1\\c_2\end{pmatrix} = \begin{pmatrix}5\\5\end{pmatrix},\] so if we can just show that \( A^{T} A = \begin{pmatrix}3&6\\6&14\end{pmatrix},\) we'd be finished. Consider the columns of \(A\) \[\begin{pmatrix}1\\1\\1\end{pmatrix}, \begin{pmatrix}1\\2\\3\end{pmatrix}\] and note that \[\begin{align} \begin{pmatrix}1\\1\\1\end{pmatrix} \cdot \begin{pmatrix}1\\1\\1\end{pmatrix} &= 3, \begin{pmatrix}1\\1\\1\end{pmatrix} \cdot \begin{pmatrix}1\\2\\3\end{pmatrix} = 6\\ \begin{pmatrix}1\\2\\3\end{pmatrix} \cdot \begin{pmatrix}1\\1\\1\end{pmatrix} &= 6, \begin{pmatrix}1\\2\\3\end{pmatrix} \cdot \begin{pmatrix}1\\2\\3\end{pmatrix} = 14, \end{align}\] which are precisely the entries of our matrix above! In fact, putting everything together, we have \[A^TA\begin{pmatrix}c_1\\c_2\end{pmatrix} = A^Tb.\]

We’ve now seen how to get the “best” solution to a system of linear equations given by \(Ax = b\): solve \(A^TAx = A^Tb\). This finds its main application in modeling data.

For instance, suppose we have some height to weight data in the form of \(n\) ordered pairs \((h_i, w_i)\), and want to model this as a line (so we want to write weight as a linear function of height). The first step is to formulate this as a least-squares problem. Which of the following makes the most sense for \(A\)?

In this quiz, we looked at the problem of finding the “best” solution when a perfect solution doesn’t exist. The metric we used was the least squares distance, i.e. finding the \(x\) that minimizes \(||Ax - b||\) \((\)of course, this is 0 if \(Ax = b\), so a perfect solution would still be found\().\) As we saw, finding the least squares solution is equivalent to solving the matrix equation \[A^TAx = A^Tb.\] This finds its most direct application in fitting data to a line, which is extremely common in the real world. As we saw, this can be solved directly: if we have \(n\) points \((x_i, y_i)\), we first shift so that \(\sum x_i = 0\), and then the solution is \[c_1 = \frac{\sum y_i}{n}, c_2 = \frac{\sum x_iy_i}{\sum x_i^2},\] where \(y = c_1 + c_2x\).

×

Problem Loading...

Note Loading...

Set Loading...