Machine Learning

One of the benefits of least squares regression is that it is easy to generalize from its uses on scatter plots to 3D or even higher dimensional data.

Previously, we learned that when least squares regression is used on two-dimensional data the SSE is given by the formula SSE=i=1n(yimxib)2.SSE = \sum_{i=1}^{n} (y_i - mx_i - b)^2. This gives us a good idea of what a higher dimensional error function will look like.

If there are pp predictor variables {x1,x2,,xp}\{x_1, x_2,\ldots, x_p\} and one response variable y,y, then a linear equation which outputs yy will take the form y=m1x1+m2x2++mpxp+b.y = m_1x_1+m_2x_2+\cdots+m_px_p+b.

Given this information, what is a reasonable formula for the error when there is more than one predictor variable?

A: i=1n(yim1x1im2x2impxpib)2\sum_{i=1}^{n} (y_i - m_1x_{1i} - m_2x_{2i} - … - m_px_{pi} - b)^2

B: i=1n(yim1x1im2x2impxpib)p\sum_{i=1}^{n} (y_i - m_1x_{1i} - m_2x_{2i} - … - m_px_{pi} - b)^p

C: i=1n(yim1x1im2x2impxpib)2\sum_{i=1}^{n} (y_i - m_1x_{1i} * m_2x_{2i} * … * m_px_{pi} - b)^2

Remember, the squared error of a single point is the squared difference between the yy-value and the predicted yy-value at that point. The SSE for the best-fit function is the sum of the squared errors for each point.

Higher Dimensional Regression

         

Earlier, we derived a formula for a best-fit line. Now, we will attempt to modify this formula so that it works for higher dimensional linear regression. Instead of outputting a best-fit line, this formula will now output a best-fit hyperplane--a linear equation in higher dimensions.

In the last chapter, we started our derivation by representing our best-fit equation with a vector. We can do so again with x=[m1m2mpb].\vec{x} = \begin{bmatrix} m_1 \\ m_2 \\ \vdots \\ m_p \\ b \\ \end{bmatrix}. Now, we must create a matrix AA which, when multiplied with x,\vec{x}, outputs a vector containing the predicted value of yy for each data point in the set.

Previously, we did this by making AA’s first column the xx-values of all data points and its second column a line of ones. Now, we can achieve the same results for higher dimensions by adding another column to AA for each additional predictor variable. This is shown below for a data set with nn points and pp predictor variables: A=[x11x12x1p1x21x22x2p1   xn1xn2xnp1].A = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} & 1 \\ x_{21} & x_{22} & \cdots & x_{2p} & 1 \\ \vdots & ~ & ~ & ~ & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} & 1 \\ \end{bmatrix}. Once again, we will also initialize the vector b\vec{b} to contain the yy-values of every data point.

As it turns out, from this point on the derivation is exactly the same as before. We have to find the vector x\vec{x} for which AxA\vec{x} is as close as possible to b,b, and once again we can do this by solving the equation ATb=ATAx.A^T\vec{b} = A^TA\vec{x}. After that, we have our answer. The elements of x\vec{x} will give the coefficient values for the best-fit hyperplane.

Higher Dimensional Regression

         

Alfred is back, and this time he’s remembered there are multiple types of trees. He’s managed to compile a table of the seeds he planted each spring as well as the number of new sprouts each fall. Using this information, identify the matrix AA which he needs to create in the process of calculating a best-fit linear equation.

Oak SeedsMaple SeedsNew Growths1059487435624\begin{array}{c|c|c} \text{Oak Seeds} & \text{Maple Seeds} & \text{New Growths} \\ \hline 10&5&9 \\ \hline 4 & 8&7\\ \hline 4 & 3& 5 \\ \hline 6 & 2&4\\ \end{array}

Higher Dimensional Regression

         

At this point, we can find a best-fit hyperplane for any conceivable data set, as long as there are more data points than predictors. But there’s one major problem. What if the points in a data set are very predictable, but not in a linear fashion?

As it turns out, there is a simple way to expand on our previous model. We can just add new, nonlinear terms to our function and update the rest of our math accordingly.

Generally, this is done by adding powers of the predictor variables, in which case this process is known as polynomial regression.

For instance, say we have a simple data set in which there is one predictor variable xx and one response variable yy. The only twist is that we suspect yy to be best represented by a second degree polynomial of xx.

Instead of representing the data with a best-fit line y=mx+b,y = mx + b, we should now represent it with a best-fit polynomial y=m1x2+m2x+b.y = m_1x^2+m_2x+b. In many ways, this is the same as creating another predictor variable. We have taken each point in our data set and added another value, x2.x^2. After this step, we can calculate the coefficients as we normally would in higher dimensional linear regression.

Higher Dimensional Regression

         

Franklin is in the business of building toy race cars and is analyzing the relationship between the weight and top speed of a car when all else is held equal. So far he’s managed to collect just five data points, but he’s convinced that the relationship should be modeled with a cubic polynomial.

Given the table below, which matrix AA must he construct in the process of calculating the best-fit curve? xy530426620318715\begin{array}{c|c} x & y \\ \hline 5&30 \\ \hline 4 & 26\\ \hline 6 & 20 \\ \hline 3 & 18\\ \hline 7 & 15 \end{array}

1: A=[155184125813913991] A = \begin{bmatrix} 155 & 1 \\ 84 & 1\\ 258 & 1\\ 39 & 1 \\ 399 & 1 \end{bmatrix} \hspace{1cm} 2: A=[55514441666133317771] A = \begin{bmatrix} 5 & 5 & 5 & 1 \\ 4 & 4 & 4 & 1\\ 6 & 6 & 6 & 1\\ 3 & 3 & 3 & 1 \\ 7 & 7 & 7 & 1 \end{bmatrix} 3: A=[12525516416412163661279313434971] A = \begin{bmatrix} 125 & 25 & 5 & 1 \\ 64 & 16 & 4 & 1\\ 216 & 36 & 6 & 1\\ 27 & 9 & 3 & 1 \\ 343 & 49 & 7 & 1 \end{bmatrix}

Higher Dimensional Regression

         
×

Problem Loading...

Note Loading...

Set Loading...