Note: This quiz requires some knowledge of linear algebra.
Statistical tools are extremely powerful tools when analyzing data, but they are not the only tools. Linear algebra, for instance, often presents a more intuitive way to view best-fit lines that is easier to generalize.
The best-fit line given by the equation is actually known as the least squares regression line, which means that if we sum the square of the vertical distance from each data point to the best-fit line, the result will be less than it would be for any other line.
This definition allows us to define an error function for any given line which outputs the sum of squared errors, or the sum of the square of each point’s vertical distance from a line. This quantity is often abbreviated as SSE. We use this as a measure of how much a regression line deviates from the actual data set. The least squares regression line is the line for which the error function is at its minimum value.
If we put this in mathematical terms we find the formula, Now, we can think of our best-fit line as the line with values of and which minimize the error function. This is extremely useful because it gives us a concrete set of criteria for our best-fit line which we can expand on to suit our needs. For now, though, we will just work with our new definition in an unaltered state.
There are several points in the scatter plot shown as well as a best-fit line candidate. We have shown the vertical distance from each point to the line. What is the SSE?
One benefit of our new definition is that it gives us an alternative method for calculating our best-fit line, one which uses linear algebra techniques.
Say we have a data set containing points: We need to derive a formula which can give us the least squares regression line for our data set, so a good first step is to put our variables into linear algebra terms.
First, we realize that since every line can be represented by the equation we can also represent every line with a single, two-dimensional vector that stores the line's slope and y-intercept: Now, we define an matrix For the row of will contain in the first column and 1 in the second: This definition may appear somewhat arbitrary, but it becomes useful when we multiply by a vector representing a line. For any vector the vector given by will contain the -values the line represented by will predict for each -value in our data set. In other words, calculating is like feeding the -value from each point into the function represented by
As a result, if we define a new vector so that its element will be from our data set, we can find the vertical distance between each point and the -value predicted for it by subtracting from
We can now find the SSE by individually squaring the values inside and adding them together. Interestingly, this process is equivalent to squaring the length of so the SSE is equal to the squared distance in -dimensional space between and
In other words,
Now, let’s say we have a data set of just three points: Verify the previous findings by using our new formula to calculate the SSE of line
Finally, we are in a position to use linear algebra techniques to find a formula for our best-fit line.
First, we must realize that when we are minimizing the SSE, we are actually minimizing This is equivalent to minimizing the distance between and since minimizing a squared positive value will also minimize the value itself.
So, what we need to find is the vector for which is as close as possible to . To rephrase this question, we need the vector in ’s column space which is closest to .
Suppose we have a column space in , , a vector , and , the point closest to on
It would make sense if was equal to , the projection of onto It turns out there is a simple proof that this is true, but let's try to find an intuitive reason.
If you shine a light source directly above and the plane representing then casts a shadow onto the plane. This shadow corresponds to the projection of onto which we call The tip of the shadow lies on the plane and some careful thought should convince you that no point can be closer to the tip of and still be on the plane
We begin by picking an arbitrary point on , . To get from to , we can first travel to , and then travel perpendicularly from to . is given by drawing a perpendicular line from to so Pythagorean’s theorem shows us that This is depicted in the picture below:
This means that no point on can be closer to than , and that must equal . In other words, if we draw a perpendicular line from to the point where it intersects with will be the point on closest to .
Derive an equation to solve for the optimum . Use the fact that is perpendicular to when is the closest to a solution for .