Data Presentation - Histogram
A histogram is a graphical method of displaying quantitative data, similar to a box plot or stem and leaf plot. A histogram displays the single quantitative variable along the x axis and frequency of that variable on the y axis. The distinguishing feature of a histogram is that data is grouped into "bins", which are intervals on the x axis.
Making a Histogram
This histogram was created with the following numbers:
1 |
|
To construct a histogram, you first decide the bin size to put your numbers in. There are many possible sizes and numbers of bins for a given set of data. This particular example uses 6 bins with a size of 10.
For all histograms, each bin must be the same size, and all the data must be represented in the bins.
After deciding this, put each number into its corresponding interval.
Now it resembles the histogram. The height of each bin is the number of elements inside. Be sure to label the x and y axes.
Choosing bin sizes is the most important part of constructing a histogram. Consider these histograms. They graph the same data as the first example.
Can you identify the difference? The left graph has twelve bins of width 5 and the right graph has three bins of width 20. Both cover the same range and contain all the data. However, these graphs distort the representation.
Large bins cover up trends and nuances, while small bins are unreasonable and can be just as difficult to analyze.
Analyzing a Histogram
There are four main things to describe about a histogram. There's a nifty acronym to help you remember - SOCS!
S hape: Some adjectives used to describe shape are...
unimodal, one peak; bimodal, two peaks; or uniform, for no clear peaks;
symmetric, the left half looks similar to the right; or asymmetric
skewed left, the data seems to be squished to the right and trails to the left; or skewed right, the data seems to be squished to the left and trails to the right.
You should also note any gaps in the graph.
O utliers: Are there any data that look far removed from the main group? These would be classified as extreme values. Any other outstanding or unusual features should be noted.
C enter: There are two different ways to classify the center...
Median, the halfway point so that there are just as many data values to the left of the median as there are to the right. Use this for a skewed graph.
Mean, the average of the data values. Use this for a symmetric graph.
S pread: Spread refers to the variation of the data. There are a few different ways to measure varation...
Range, the distance between the maximum and minimum values. Generally used as a rough estimate and not anything precise, and especially poor when there are outliers.
Interquartile Range (IQR): the distance between first and third quartiles of the data set. Better than regular range because it ignores outliers.
Standard Deviation, which is a somewhat complicated process process measuring distances from the mean. Shouldn't be needed for the SAT.
Examples
How many data were used to create the following histogram? Assume the y-axis scale is 1.
Adding the height of each bin gives \(2+3+4+6+1+1+1=18 \).
Describe the following histogram:
The distribution is skewed right and unimodal with a peak between 400 to 450. There may be possible outliers toward 650. The median should be used as a measure of center. The range of the distribution is moderate.
Notes: Since we are only given the picture and no numbers or data to work with, we have to be somewhat vague. We can only speculate about outliers without calculating anything, so the word "possible" is important. Words like "moderate", "somewhat", "slightly", and so on are usually acceptable for these types of descriptions.
Construct a histogram to represent these data:
112, 15, 16, 21, 22, 31, 39, 42, 46, 53, 54, 55, 57, 59, 61, 62, 67, 67, 70, 71, 78, 83, 87, 89, 91, 96, 97, 98, 98, 100
The maker of this problem was nice enough to sort from least to greatest. Decide on a bin size; there are multiple reasonable answers. This will use 10 bins of width 10.