# Multivariate Regression

**Multivariate Regression** is a method used to measure the degree at which more than one independent variable (**predictors**) and more than one dependent variable (**responses**), are linearly related. The method is broadly used to predict the behavior of the response variables associated to changes in the predictor variables, once a desired degree of relation has been established.

**Exploratory Question**: Can a supermarket owner maintain stock of water, ice cream, frozen foods, canned foods and meat as a function of temperature, tornado chance and gas price during tornado season in June?

From this question, several obvious assumptions can be drawn: If it is too hot, ice cream sales increase; If a tornado hits, water and canned foods sales increase while ice cream, frozen foods and meat will decrease; If gas prices increase, prices on all goods will increase. A mathematical model, based on multivariate regression analysis will address this and other more complicated questions.

## Simple Regression

The **Simple Regression** model, relates one predictor and one response.

Let \(n\) observations be \((x_1,y_1),(x_2,y_2),\ldots ,(x_n,y_n)\) pairs of predictors and responses, such that \(\epsilon_i\sim \mathcal{N}(0,\sigma^2)\) are i.i.d (independent and identically distributed). For fixed real numbers \(\beta_0\) and \(\beta_1\) (parameters), the

modelis as follows:\[y_i=\beta_0+\beta_1 x_i + \epsilon_i\]

The

fitted model(fitted to the given data) is as follows:\[\hat y_i =\hat\beta_0+\hat\beta_1 x_i\]

The estimated parameters are \(\hat\beta_1=\frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sum{(x_i - \bar{x})^2}}\) and \(\hat\beta_0=\bar y - \hat\beta_1 \bar x\), such that \(\bar x\) and \(\bar y\) are the sample averages.

**Note**: In most applications, it is assumed that error terms are iid \(\mathcal{N}(0,\sigma^2)\). In general the error terms are not assumed to follow a particular distribution, they are assumed to be \(E(\epsilon_i)=0\), \(Var(\epsilon_i)=\sigma^2\) and \(Cov(\epsilon_i,\epsilon_j)=0\) for \(i\neq j\), expected value, variance and covariance.

## Multiple Regression

The **Multiple Regression** model, relates more than one predictor and one response.

Let \(\textbf{Y}\) be the \(n\times 1\) response vector, \(\textbf{X}\) be an \(n\times (q+1)\) matrix such that all entries of the first column are \(1's\), and \(q\) predictors. Let \(\boldsymbol{\epsilon}\) be an \(n\times 1\) vector such that \(\boldsymbol{\epsilon}_i\sim \mathcal{N}(0,\sigma^2)\) are i.i.d (independent and identically distributed), and \(\boldsymbol{\beta}\) be an \((q+1)\times 1\) vector of fixed parameters. The model is as follows:

\[\textbf{Y}=\textbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon}\]

In detail notation we have:

\[\begin{pmatrix} y_{1}\\ y_{2}\\ y_{3}\\ \vdots\\ y_{n} \end{pmatrix} = \begin{pmatrix} 1&x_{11}&x_{12}&\ldots&x_{1q}\\ 1&x_{21}&x_{22}&\ldots&x_{2q}\\ 1&x_{31}&x_{32}&\ldots&x_{3q}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n1}&x_{n2}&\ldots&x_{nq} \end{pmatrix} \begin{pmatrix} \beta_{0}\\ \beta_{1}\\ \beta_{2}\\ \vdots\\ \beta_{q} \end{pmatrix} +\begin{pmatrix} \epsilon_{1}\\ \epsilon_{2}\\ \epsilon_{3}\\ \vdots\\ \epsilon_{n} \end{pmatrix} \]

## Multivariate Regression

The **Multivariate Regression** model, relates more than one predictor and more than one response.

Let \(\textbf{Y}\) be the \(n\times p\) response matrix, \(\textbf{X}\) be an \(n\times (q+1)\) matrix such that all entries of the first column are \(1's\), and \(q\) predictors. Let \(\textbf{B}\) be an \((q+1)\times p\) matrix of fixed parameters, \(\boldsymbol{\Xi}\) be an \(n\times p\) matrix such that \(\boldsymbol{\Xi}\sim \mathcal{N}(0,\boldsymbol{\Sigma})\) (multivariate normally distributed with covariance matrix \(\boldsymbol{\Sigma}\)). The model is as follows:

\[\textbf{Y}=\textbf{X}\textbf{B}+\boldsymbol{\Xi}\]

In detail notation we have:

\[\begin{pmatrix} y_{11}&y_{12}&\ldots&y_{1p}\\ y_{21}&y_{22}&\ldots&y_{2p}\\ y_{31}&y_{32}&\ldots&y_{3p}\\ \vdots&\vdots&\ddots&\vdots\\ y_{n1}&y_{n2}&\ldots&y_{np}\\ \end{pmatrix} = \begin{pmatrix} 1&x_{11}&x_{12}&\ldots&x_{1q}\\ 1&x_{21}&x_{22}&\ldots&x_{2q}\\ 1&x_{31}&x_{32}&\ldots&x_{3q}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{n1}&x_{n2}&\ldots&x_{nq} \end{pmatrix} \begin{pmatrix} \beta_{01}&\beta_{02}&\ldots&\beta_{0p}\\ \beta_{11}&\beta_{12}&\ldots&\beta_{1p}\\ \beta_{21}&\beta_{22}&\ldots&\beta_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ \beta_{q1}&\beta_{q2}&\ldots&\beta_{qp}\\ \end{pmatrix} +\begin{pmatrix} \epsilon_{11}&\epsilon_{12}&\ldots&\epsilon_{1p}\\ \epsilon_{21}&\epsilon_{22}&\ldots&\epsilon_{2p}\\ \epsilon_{31}&\epsilon_{32}&\ldots&\epsilon_{3p}\\ \vdots&\vdots&\ddots&\vdots\\ \epsilon_{n1}&\epsilon_{n2}&\ldots&\epsilon_{np}\\ \end{pmatrix} \]

The MLE and unbiased estimator for \(\textbf{B}\) is called the least square estimator, denoted \(\boldsymbol{\hat B}\):

\[\boldsymbol{\hat B}=(\boldsymbol{X^T}\boldsymbol{X})^{-1}\boldsymbol{X^T}\boldsymbol{Y}\]

This estimator minimizes \((\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{\hat B})^T(\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{\hat B})\).

The unbiased estimator for \(\boldsymbol{\Sigma}\), denoted \(\boldsymbol{\hat \Sigma}\):

\[\boldsymbol{\hat \Sigma}=\frac{1}{n-q-1}(\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{\hat B})^T(\boldsymbol{Y} - \boldsymbol{X}\boldsymbol{\hat B})\]

Fitted ModelThe fitted (prediction) model given by \(\boldsymbol{\hat B}\) is as follows:

\[\boldsymbol{\hat Y}=\boldsymbol{X}\boldsymbol{\hat B}\]

With predicted error \(\boldsymbol{\hat \Xi}=\boldsymbol{Y}-\boldsymbol{\hat Y}\).

Sample Covariance and \(r_{1}^{2}\)The matrix of sample covariance, \(\boldsymbol{S}\), is given by a block matrix such that \(\boldsymbol{S_{yy}}\), \(\boldsymbol{S_{xy}}\), \(\boldsymbol{S_{yx}}\) and \(\boldsymbol{S_{xx}}\), and has the following form:

\[\boldsymbol{S}=\begin{pmatrix} \boldsymbol{S_{yy}}&\boldsymbol{S_{yx}}\\ \boldsymbol{S_{xy}}&\boldsymbol{S_{xx}} \end{pmatrix}\]

A measure on the association of the variables of the model will be denoted \(\boldsymbol{r_1^{2}}\), with a range between zero and one. This measure, \(\boldsymbol{r_{1}^2}\), is the largest eigenvalue of the following matrix:

\[\boldsymbol{S_{yy}^{-1}}\boldsymbol{S_{yx}}\boldsymbol{S_{xx}^{-1}}\boldsymbol{S_{xy}}\]

**Cite as:**Multivariate Regression.

*Brilliant.org*. Retrieved from https://brilliant.org/wiki/multivariate-regression/