# Confidence Intervals

A **Confidence Interval** is a region constructed using sampled data, of fixed size, from a population (sample space) following certain probability distribution. The interval is constructed as to contain a chosen population statistic with prescribed probability, for example the mean \(\mu\). Given a sample of size \(n\), assume a 95% confidence interval \((a, b)\) is constructed to contain the mean \(\mu\). This means that if all possible samples of size \(n\) from the population are considered, then 95% of the confidence intervals constructed from each of the samples will contain \(\mu\). The fact that a confidence interval with certain probability has been constructed, does not guarantee that the interval constructed will contain the true statistic of the population. Therefore, if the sample was chosen at random, then there is a 95% confidence (chance) that \((a, b)\) contains \(\mu\). For a confidence interval associated with a random sample, the proportion of all such intervals that contains the population mean is called the **confidence level**. In the previous example the confidence level is \(0.95\).

[**Include image** of a normal(0,1) graph with a zero centered interval describing the 95% area (shaded), with tails at \(\frac{\alpha}{2}\) and its negative]

#### Contents

## Sampling Distributions and the Central Limit Theorem

As previously observed, if repeated random samples (with replacement) of the same size are taken from a population, different measures called statistics can be computed from each of these samples. For a given statistic (for example the mean), the probability distribution of the means of all samples of size \(n\) is called the sampling distribution of the mean. If the population has mean \(\mu \) and variance \(\sigma^2\), then for large values of \(n\) the central limit theorem implies that the sampling distribution for size \(n\), is approximately normally distributed with mean \(\mu \) and variance \(\frac{\sigma^2}{n}\).

In order to simplify calculations using the normal distribution, more notation must be introduced. The probability corresponding to the set of values of a random variable which are at most a fixed value \(x\), is denoted \(P(X\leq x)\). If the random variable \(X\sim \mathcal{N}(\mu,\sigma^2)\) represents the distribution of interest, using a simple transformation \(Z=\frac{X-\mu}{\sigma}\) translates the random variable \(X\) into \(Z\sim \mathcal{N}(0,1)\). For a fixed real number \(\alpha\in (0,1) \), \(z_\alpha\) denotes the solution of the equation \(P(Z\geq z)=\alpha\). For the t-distribution with \(n-1\) degrees of freedom, \(t_{\alpha,n-1}\) denotes the solution of the equation \(P(T\geq t)=\alpha\).

## Definitions

Estimators and Standard ErrorsGiven a sample \(\{x_1, x_2,\ldots , x_n\}\) of size \(n\) from a population with mean \(\mu \) and variance \(\sigma^2 \), the

Sample Meanis \(\bar X =\frac{x_1+x_2+\ldots+ x_n}{n}\) which in this setting it is also known as the population mean estimator. TheSample Varianceis \(s^2=\frac{(x_1 -\bar X)^2 +(x_2 -\bar X)^2 +\ldots+ (x_n -\bar X)^2 }{n-1}\). The standard deviation of the sampling distribution \(\frac{\sigma}{\sqrt{n}}\) is known as thestandard error.

\(100(1-\alpha)\)% Confidence Interval for the Population Mean (known variance)Given a sample \(\{x_1, x_2,\ldots , x_n\}\) of size \(n\) from a population with mean \(\mu \) and variance \(\sigma^2 \), the

Confidence Intervalfor the population mean with confidence level \(1-\alpha\) associated with the sample is \((\bar X -z_{\frac{\alpha}{2}}\cdot \frac{\sigma}{\sqrt{n}}, \bar X +z_{\frac{\alpha}{2}}\cdot \frac{\sigma}{\sqrt{n}})\).

\(100(1-\alpha)\)% Confidence Interval for the Population Mean (unknown variance)

For large samples, \(z\)-values:Given a sample \(\{x_1, x_2,\ldots , x_n\}\) of size \(n\) from a population with mean \(\mu \) and variance \(\sigma^2 \), the

Confidence Intervalfor the population mean with confidence level \(1-\alpha\) associated with the sample is \((\bar X -z_{\frac{\alpha}{2}}\cdot \frac{s}{\sqrt{n}}, \bar X +z_{\frac{\alpha}{2}}\cdot \frac{s}{\sqrt{n}})\), such that \(s\) is the sample standard deviation, i.e. the square root of \(s^2\).

For small samples, \(t\)-values with \(n-1\) degrees of freedom:Given a sample \(\{x_1, x_2,\ldots , x_n\}\) of size \(n\) from a population with mean \(\mu \) and variance \(\sigma^2 \), the

Confidence Intervalfor the population mean with confidence level \(1-\alpha\) associated with the sample is \((\bar X -t_{\frac{\alpha}{2},n-1}\cdot \frac{s}{\sqrt{n}}, \bar X +t_{\frac{\alpha}{2},n-1}\cdot \frac{s}{\sqrt{n}})\), such that \(s\) is the sample standard deviation, i.e. the square root of \(s^2\).

## Discussion of Construction

A detailed explanation of the concept must be exposed for a deeper understanding on why and how these type of equations are of interest, not only to statisticians but any other area of study that requires the analysis of data to draw conclusions.

Let us consider the next question:

How one measures the growth in height of a person at \(\pi\) seconds after midnight, the day after his/her 21st birthday?

In a theoretical setting, it is known exactly how to deal with these type of questions. But in practice, being able to precisely measure anything becomes impossible. Since giving up is not an option, the next best thing is to approximate minimizing errors in measurements. Moreover, it is assumed that these errors are distributed in such a manner to make them follow a known probability distribution.

Construction-Known VarianceGiven a sample of size \(n\) following a probability distribution with mean \(\mu\) and variance \(\sigma^2\). Since the sample mean \(\bar X\) is an estimator for the population mean \(\mu\), approximated by a normal distribution with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\), we are interested in the probability of our estimator be close to the population mean.

\[P(\lvert\bar X - \mu \rvert < \delta)=1-\alpha \]

We must determine the value \(\delta\) that works in our assumptions. Since \(\bar X \) is approximately normally distributed with variance \(\frac{\sigma^2}{n}\), the random variable can be transformed to follow a \(\mathcal{N}(0,1)\).

\[\begin{align} 1-\alpha &= P\left (\left | \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}}\right | < \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right ) \\ &= P\left (\left | Z \right | < \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right ) \\ &= P\left (-\frac{\delta}{\frac{\sigma}{\sqrt{n}}} < Z < \frac{\delta}{\frac{\sigma}{\sqrt{n}}}\right ) \\ \end{align}\]

Following previously defined notation we have,

\[ \frac{\delta}{\frac{\sigma}{\sqrt{n}}}=z_{\frac{\alpha}{2}} \]

Which implies,

\[ \delta=z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}} \]

Therefore, in order to obtain an interval with confidence level \( 1 - \alpha \) we must have,

\[ - \delta < \bar X - \mu < \delta \] \[ - z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}} < \bar X - \mu < z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}} \] \[ \bar X - z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}} < \mu < \bar X + z_{\frac{\alpha}{2}} \cdot \frac{\sigma}{\sqrt{n}} \]

## Examples and Problems

The following table shows the most common confidence levels together with their corresponding z-values, used to construct confidence intervals.

Confidence Level \(100(1-\alpha)\%\) | \[\alpha\] | \[\frac{\alpha}{2}\] | \[z_{\frac{\alpha}{2}}\] |

\[90\%\] | \[0.10\] | \[0.050\] | \[1.645\] |

\[95\%\] | \[0.05\] | \[0.025\] | \[1.960\] |

\[99\%\] | \[0.01\] | \[0.005\] | \[2.575\] |

Tables for t-values with degrees of freedom, are more complicated.

A veterinarian is studying a particular side effect of a new dogs heartworm medication, the side effect consists in patches of hair loss in the subject. The manufacturer offered data for a sample of size 50 from a population of 1,000 subjects. The side effects became prevalent if the dosage was over 330 mcg of active ingredient, and the drug became ineffective if the dosage was under 260 mcg of active ingredient, as a 95% confidence interval from a sample of 50 subjects. What is the population variance?

SolutionAccording to the problem, the interval \((260,330)\) is a 95% confidence interval for the population mean of the drug effectiveness. Since confidence intervals are constructed with the sample mean at its center, then \(\bar X=\frac{260+330}{2}=295\). Also, we must have \(\bar X + z_{0.025} \cdot \frac{\sigma}{\sqrt{50}}=330 \), which implies \(\sigma^2=50\cdot\left ( \frac{330-\bar X}{z_{0.025}} \right )^2 \). Therefore the population variance is \(\sigma^2=50\cdot\left ( \frac{330-295}{1.96} \right )^2\approx 15943.88 \).

According to data from NOAA (National Oceanographic and Atmospheric Administration), monthly sea level fluctuation taken from the San Francisco Bay Area followed a \(95\%\) confidence interval \((1.75,2.13)\) mm/year. If the standard deviation of the fluctuation is \(3.6478\) mm/year, how many years did the data cover?

## See Also

**Cite as:**Confidence Intervals.

*Brilliant.org*. Retrieved from https://brilliant.org/wiki/confidence-intervals/