Box and Whisker Plot
A box and whisker plot is a representation of statistical data that displays information about the distribution of the values. Here's an example of a box and whisker plot:
In the image above, we can see the distribution of mathematics scores between Armenia and Hungary.
Introduction
Each mark on the box and whisker plot shows a different part of the distribution of the data. The central mark shows the median, the end marks show the lowest and highest values, and the central box shows the middle 50% of the data (known as the interquartile range (iqr)) which includes all the values between the first quartile and the third quartile.
Box and whisker plots allow us to quickly compare two data sets visually, although they do not include enough information to completely characterize a data set.
Quartiles and the Five Number Summary
In order to produce a box and whisker plot, we need to know the five number summary. This set of values provides 5 important statistics about the data set: the minimum, first quartile, median , third quartile, and maximum. The five values split the data into four groups, each of which represents 25% of the data points.
We find the first and third quartiles by calculating the median of the lower and upper halves of the data, respectively. This is fairly easy to do if the number of values in the set is even, but unfortunately there is no universally accepted method for how to calculate the quartiles if the data set has an odd number of terms (the disagreement is over whether or not to include the value of the median in the upper and lower halves).
Let's find the five number summary for the data set \(\{2, 3, 5, 5, 7, 10, 10, 11\}.\)
The minimum value is 2 and the maximum value is 11. The median is the middle value, or the average of 5 and 7 which is 6. The first quartile is the median of the set \(\{2, 3, 5, 5\}\) which is 4. The third quartile is the median of the set \(\{7, 10, 10, 11\}\) which is 10.
Therefore, the five number summary is 2, 4, 6, 10, 11.
We create a box and whisker plot of a data set by plotting the five values from the five number summary above a number line.
Now, we draw a line segment through the five points, a box from the first quartile to the third quartile, and a vertical line at the median.
Construct a box and whisker plot for the data set \(\{1, 3, 3, 6, 6, 7, 7, 9\}.\)
First, we must find the five number summary for the data set:
- Minimum: 1
- First Quartile: the median of \(\{1,3,3,6\}\), which is 3
- Median: The mean of 6 and 6, which is 6
- Third Quartile: the median of \(\{6, 7, 7, 9\}\), which is 7
- Maximum: 9
Then, we use that information to construct a box and whisker plot:
Outliers
When there are outliers in the data, they are often representing in a box and whisker plot by indicating them with a single mark rather than extending the whiskers of the plot to include them. This means that rather than extending the lines all the way to the minimum or maximum, they would instead only include all non-outlier portions of the data. Outliers are often defined to by any data points that are further from the median than \(1.5 \times \text{IQR}\), where IQR means the length of the interquartile range.