Machine Learning

Note: This quiz requires some knowledge of linear algebra.

Statistical tools are extremely powerful tools when analyzing data, but they are not the only tools. Linear algebra, for instance, often presents a more intuitive way to view best-fit lines that is easier to generalize.

The best-fit line given by the equation yy=rSDySDx(xx)y - \overline{y} = \frac{rSD_y}{SD_x}(x-\overline{x}) is actually known as the least squares regression line, which means that if we sum the square of the vertical distance from each data point to the best-fit line, the result will be less than it would be for any other line.

This definition allows us to define an error function for any given line y=mx+by = mx+b which outputs the sum of squared errors, or the sum of the square of each point’s vertical distance from a line. This quantity is often abbreviated as SSE. We use this as a measure of how much a regression line deviates from the actual data set. The least squares regression line is the line for which the error function is at its minimum value.

If we put this in mathematical terms we find the formula, SSE=i=1n(yimxib)2SSE = \sum_{i=1}^{n} (y_i - mx_i - b)^2 Now, we can think of our best-fit line as the line with values of mm and bb which minimize the error function. This is extremely useful because it gives us a concrete set of criteria for our best-fit line which we can expand on to suit our needs. For now, though, we will just work with our new definition in an unaltered state.

Linear Algebra in Linear Regression


There are several points in the scatter plot shown as well as a best-fit line candidate. We have shown the vertical distance from each point to the line. What is the SSE?

Linear Algebra in Linear Regression


One benefit of our new definition is that it gives us an alternative method for calculating our best-fit line, one which uses linear algebra techniques.

Say we have a data set containing nn points: (x1,y1),(x2,y2),,(xn,yn).(x_1, y_1), (x_2, y_2),\ldots, (x_n, y_n). We need to derive a formula which can give us the least squares regression line for our data set, so a good first step is to put our variables into linear algebra terms.

First, we realize that since every line can be represented by the equation y=mx+b,y = mx + b, we can also represent every line with a single, two-dimensional vector that stores the line's slope and y-intercept: x=[mb].\vec{x} = \begin{bmatrix} m \\ b \\ \end{bmatrix}. Now, we define an n×2n \times 2 matrix A.A. For 1in,1 \leq i \leq n, the ithi^\text{th} row of AA will contain xix_i in the first column and 1 in the second: A=[x11x21xn1].A = \begin{bmatrix} x_1 & 1\\x_2 & 1 \\ \vdots & \vdots \\ x_n & 1\\ \end{bmatrix}. This definition may appear somewhat arbitrary, but it becomes useful when we multiply AA by a vector representing a line. For any vector x=[mb],\vec{x} = \begin{bmatrix} m \\ b \\ \end{bmatrix}, the vector given by AxA\vec{x} will contain the yy-values the line represented by x\vec{x} will predict for each xx-value in our data set. In other words, calculating AxA\vec{x} is like feeding the xx-value from each point into the function represented by x.\vec{x}.

As a result, if we define a new vector b\vec{b} so that its ithi^\text{th} element will be yiy_i from our data set, we can find the vertical distance between each point and the yy-value predicted for it by subtracting b\vec{b} from Ax.A\vec{x}.

We can now find the SSE by individually squaring the values inside AxbA\vec{x}-\vec{b} and adding them together. Interestingly, this process is equivalent to squaring the length of Axb,A\vec{x}-\vec{b}, so the SSE is equal to the squared distance in nn-dimensional space between AxA\vec{x} and b.\vec{b}.

In other words, SSE=Axb2.SSE = \Big|\Big|A\vec{x}-\vec{b}\Big|\Big|^2.

Linear Algebra in Linear Regression


Now, let’s say we have a data set of just three points: (1,0),(3,4),(2,3).(1, 0), (3, 4), (2, 3). Verify the previous findings by using our new formula to calculate the SSE of line y=3x+2.y = 3x + 2.

Linear Algebra in Linear Regression


Finally, we are in a position to use linear algebra techniques to find a formula for our best-fit line.

First, we must realize that when we are minimizing the SSE, we are actually minimizing Axb2.||A\vec{x}-\vec{b}||^2. This is equivalent to minimizing the distance between AxA\vec{x} and b\vec{b} since minimizing a squared positive value will also minimize the value itself.

So, what we need to find is the vector x\vec{x} for which AxA\vec{x} is as close as possible to b\vec{b}. To rephrase this question, we need the vector in AA’s column space which is closest to b\vec{b}.

Linear Algebra in Linear Regression


Suppose we have a column space in R3\mathbf{R}^3, WW, a vector b\vec{b}, and AxA\vec{x}, the point closest to b\vec{b} on W.W.

It would make sense if AxA\vec{x} was equal to b^\hat{b}, the projection of b\vec{b} onto W.W. It turns out there is a simple proof that this is true, but let's try to find an intuitive reason.

If you shine a light source directly above b \vec{b} and the plane representing W, W, then b \vec{b} casts a shadow onto the plane. This shadow corresponds to the projection of b \vec{b} onto W, W, which we call b^. \hat{b}. The tip of the shadow lies on the plane and some careful thought should convince you that no point can be closer to the tip of b \vec{b} and still be on the plane W. W.

We begin by picking an arbitrary point on WW, v\vec{v}. To get from v\vec{v} to b\vec{b}, we can first travel to b^\hat{b}, and then travel perpendicularly from b^\hat{b} to b\vec{b}. b^\hat{b} is given by drawing a perpendicular line from b\vec{b} to W,W, so Pythagorean’s theorem shows us that vb2=b^v2+bb^2. ||\vec{v}-\vec{b}||^2 = ||\hat{b}-\vec{v}||^2 + ||\vec{b}-\hat{b}||^2. This is depicted in the picture below:

This means that no point on WW can be closer to b\vec{b} than b^\hat{b}, and that AxA\vec{x} must equal b^\hat{b}. In other words, if we draw a perpendicular line from b\vec{b} to W,W, the point where it intersects with WW will be the point on WW closest to b\vec{b}.

Linear Algebra in Linear Regression


Derive an equation to solve for the optimum x\vec{x}. Use the fact that AxA\vec{x} is perpendicular to bAx\vec{b}-A\vec{x} when x\vec{x} is the closest to a solution for Ax=bA\vec{x} = \vec{b}.

Linear Algebra in Linear Regression


Problem Loading...

Note Loading...

Set Loading...