Education.com
Try
Brainzy
Try
Plus

Measures of Spread for AP Statistics (page 2)

By — McGraw-Hill Professional
Updated on Feb 5, 2011

Interquartile Range

Although the standard deviation works well in situations where the mean works well (reasonably symmetric distributions), we need a measure of spread that works well when a mean-based measure is not appropriate. That measure is called the interquartile range.

Remember that the median of a distribution divides the distribution in two—it is the middle of the distribution. The medians of the upper and lower halves of the distribution, not including the median itself in either half, are called quartiles. The median of the lower half is called the lower quartile, or the first quartile (which is the 25th percentile—Q1 on the calculator). The median of the upper half is called the upper quartile, or the third quartile (which is in the 75th percentile—Q3 on the calculator). The median itself can be thought of as the second quartile or Q2 (although we usually don't).

The interquartile range (IQR) is the difference between Q3 and Q1. That is, IQR = Q3 – Q1. When you do 1-Var Stats, the calculator will return Q1 and Q3 along with a lot of other stuff. You have to compute the IQR from Q1 and Q3. Note that the IQR comprises the middle 50% of the data.

example: Find Q1, Q3, and the IQR for the following dataset: 5, 5, 6, 7, 8, 9, 11, 13, 17.

solution: Because the data are in order, and there is an odd number of values (9), the median is 8. The bottom half of the data comprises 5, 5, 6, 7. The median of the bottom half is the average of 5 and 6, or 5.5 which is Q1. Similarly, Q3 is the medians of the top half, which is the mean of 11 and 13, or 12. The IQR = 12 - 5.5 = 6.5.

example: Find the standard deviation and IQR for the number of home runs hit by Babe Ruth in his major league career. The number of home runs was: 0, 4, 3, 2, 11, 29, 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22, 6.

solution: We put these numbers into a TI-83/84 list and do 1-Var Stats on that list. The calculator returns Sx = 20.21, Q1 = 11, and Q3 = 47. Hence the IQR = Q3 – Q1 = 47 – 11 = 36.

The range of the distribution is the difference between the maximum and minimum scores in the distribution. For the home run data, the range equals 60 - 0 = 60. Although this is sometimes used as a measure of spread, it is not very useful because we are usually interested in how the data spread out from the center of the distribution, not in just how far it is from the minimum to the maximum values.

Outliers

We have a pretty good intuitive sense of what an outlier is: it's a value far removed from the others. There is no rigorous mathematical formula for determining whether or not something is an outlier, but there are a few conventions that people seem to agree on. Not surprisingly, some of them are based on the mean and some are based on the median!

A commonly agreed-upon way to think of outliers based on the mean is to consider how many standard deviations away from the mean a term is. Some texts define an outlier as a datapoint that is more than two or three standard deviations from the mean.

In a mound-shaped, symmetric, distribution, this is a value has only about a 5% chance (for two standard deviations) or a 0.3% chance (for three standard deviations) of being as far removed from the center of the distribution as it is. Think of it as a value that is way out in one of the tails of the distribution.

Most texts now use a median-based measure and define outliers in terms of how far a datapoint is above or below the quartiles in a distribution. To find if a distribution has any outliers, do the following (this is known as the "1.5 (IQR) rule"):

  • Find the IQR.
  • Multiply the IQR by 1.5.
  • Find Q1 – 1.5(IQR) and Q3 + 1.5(IQR).
  • Any value below Q1 – 1.5(IQR) or above Q3 + 1.5(IQR) is an outlier.

Some texts call an outlier defined as above a mild outlier. An extreme outlier would then be one that lies more than 3 IQRs beyond Q1 or Q3.

example: The following data represent the amount of money, in British pounds, spent weekly on tobacco for 11 regions in Britain: 4.03, 3.76, 3.77, 3.34, 3.47, 2.92, 3.20, 2.71, 3.53, 4.51, 4.56. Do any of the regions seem to be spending a lot more or less than the other regions? That is, are there any outliers in the data?

solution: Using a calculator, we find = 3.62, Sx = s = .59, Q1 = 3.2, Q3 = 4.03.

  • Using means: 3.62 ± 2(0.59) = (2.44, 4.8). There are no values in the dataset less than 2.44 or greater than 4.8, so there are no outliers by this method. We don't need to check ± 3s since there were no outliers using ± 2s.
  • (using the 1.5IQR Rule): Q1 - 1.5(IQR) = 3.2 - 1.5(4.03 - 3.2) = 1.96, Q3 + 1.5(IQR) = 4.03 + 1.5(4.03 - 3.2) = 5.28. Because there are no values in the data less than 1.96 or greater than 5.28, there are no outliers by this method either.

Outliers are important because they will often tell us that something unusual or unexpected is going on with the data that we need to know about. A manufacturing process that produces products so far out of spec that they are outliers often indicates that something is wrong with the process. Sometimes outliers are just a natural, but rare, variation. Often, however, an outlier can indicate that the process generating the data is out of control in some fashion.

Practice problems for these concepts can be found at:

View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
Ask
150 Characters allowed