Measures of Dispersion for Numerical Data Study Guide (page 2)
Introduction to Measures of Dispersion for Numerical Data
When describing a distribution, it is important to have some measure of where the middle is. With a little thought, we can see that, if we have a measure of the middle, we still need more information to describe the distribution fully. For example, are all of the population values the same or are they spread out over a range of values? If they are spread out, how should we measure the spread? Measuring the spread, or dispersion, of population or sample values is the focus of this lesson.
Population Measures of Dispersion
Two distributions can have the same middle but still look very different. As an illustration, consider the two populations presented graphically in the dotplots in Figure 6.1.
For each of the populations, the mean, median, and mode are all 15.Yet, they are different; one is more spread out than the other. The measures of dispersion are used to quantify the spread of the distribution. Range, interquartile range, mean absolute deviation, and standard deviation are four such measures that will be discussed in this lesson.
The range is the difference in the largest and smallest population values; it is the total spread in the population. In Figure 6.1, Y assumes values from 13 to 17, giving a range of 17 – 13 = 4. The range of X is 20 – 10 = 10. Because the range of X is larger than that of Y, more than twice as large in this example, X is more spread out; that is, X has a larger dispersion than Y.
Find the range of heights of the high school orchestra members that were first discussed in Lesson 4.
The tallest orchestra member is 83.8 inches tall, and the shortest is 53.5 inches tall. The range of the heights is 83.8 – 53.5 = 30.3 inches, more than 2.5 feet!
Although the range in the heights of orchestra members is large (30.3 inches), most of the orchestra members are much closer in height than this indicates. The range is a crude measure of the dispersion of the population distribution. For example, consider the dotplot for the population distribution of Z in Figure 6.2.
Notice the range of Z is 4 as is the range of Y in Figure 6.1. Both Y and Z have means, medians, and modes of 15. Yet, the distribution of Z appears to be more spread out than that of Y. We need another measure to capture this dispersion. We will explore two such measures, beginning with the interquartile range.
The range is greatly affected by exceptionally small or large values in a population. Instead of looking at the range of all population values, the interquartile range measures the spread in the middle half of the data. To find the interquartile range, we must first find the quartiles. The quartiles are the values that divide the population into fourths, just as the median divided the population in half. The first quartile separates the bottom 25% of the data from the top 75&% of the data. The second quartile separates the bottom 50% of the data from the top 50% of the data. But, wait. That is exactly what the median does! The second quartile and the median are different names for the same quantity. The third quartile separates the bottom 75% of the data from the top 25% of the data.
The first and third quartiles are found by separating the lower half of the population values from the upper half of the population values. If there is an odd number of population values, the median is excluded from both halves. The first quartile, denoted by Q1, is the median of the bottom half of the population. The third quartile, Q3, is the median of the top half of the population values. The interquartile range (IQR) is then:
IQR = Q3 – Q1
The IQR is the range or spread of the middle half of the population values.
Find the interquartile range of the population of orchestra members.
The median, 66.85 inches, divided the population into two halves, those members who are less than 66.85 inches tall and those who are greater than 66.85 inches tall. There are 31 members in each half. Because an odd number of values exists in the half, the median of the lower half is the = 16th value in the ordered set of the 31 lower values. Thus, the first quartile is Q1 = 62.5 inches. Similarly, the third quartile is the median of the 31 upper half values. The 16th value in the ordered set of upper values is Q3 = 69.5 inches. The interquartile range is now found to be
Q3 – Q1 = 69.5 – 62.5 = 7 inches
which is a value much smaller than the range. This is the spread in the middle half of the population of heights.
Mean Absolution Deviation
Before describing the next measure of dispersion, we need to define what is meant by a deviation. The quantity, (Xi – μ) , is the deviation of the ith population value from the population. By taking the absolute value, the deviation is a measure of how far the value is from the mean. For the orchestra members, the shortest person has a deviation of 53.5 – 66.4 = –12.9 inches. The negative sign indicates that this person's height is below the mean; that is, the shortest member's height is 12.9 inches below the mean height of all orchestra members. The member who is 74 inches tall has a deviation of 74 – 66.4 = 7.6 inches. He is 7.6 taller than the average height of all orchestra members. If we add all of the deviations together, we get zero.
Mean absolute deviation is the mean (or average) distance of the population values from the population mean; that is,
Now, (Xi – μ) is the deviation of the ith population value from the mean. The absolute deviation of the ith population value from the population mean is the distance of that value from the population mean, |Xi μ |. For the populations of X, Y, and Z, we have μ = 15. For a population value of 13, the distance of that value from the mean is |13 –15| = | –2 | =2. The mean absolute deviation is the population mean of these distances, E(|X – μ|).
The mean absolute deviation of the population distribution of Y is 0.75, and that of Z is 1.14.Notice that the greater dispersion in the distribution of Z as compared to Y is captured in the mean absolute deviation.
Find the mean absolute deviation of the heights of the 62 orchestra members. Interpret the mean absolute deviation in the context of the problem.
We begin by finding the deviation and the absolute deviation from the mean for each band member. The mean height was 66.4, so the deviation for a particular orchestra member is that member's height minus 66.4. The absolute deviation is the absolute value of the deviation, or how far the member's height is from the mean. These are given in Table 6.1.
The mean absolute deviation is the average of the absolute deviations. For the orchestra members, the mean absolute deviation is 4.32 inches. This means that, on average, an orchestra member's height is 4.32 inches from the population mean height of 66.4 inches.
Variance and Standard Deviation
Two more measures of dispersion are the variance and the standard deviation. The variance is the mean squared distance of the population values from the mean; that is,
Notice that we have expressed the variance as an expected value.Here, it represents the average squared deviation of a population value from the mean. For the population distribution of Y, displayed in Figure 6.1, the variance is
For the population distribution of Z, displayed in Figure 6.2, the variance is 2.2. The units associated with the variance are the square of the measurement units of the population values.As an illustration, if the population values are recorded in inches, as they are with the orchestra members' heights, the variance is in inches2. To obtain a measure of dispersion in the same units as the population values, the standard deviation is found to be the square root of the variance; that is, σ = √σ2.
Although not technically correct, the standard deviation is often described as being the average distance of a population value from the mean. For the population distribution of Y, the standard deviation is √1 = 1. For Z, displayed in Figure 6.2, the population standard deviation is √2.2 = 1.5. The mean absolute deviation and the standard deviation are the same for Y. This is very unusual.More commonly, the mean absolute deviation is close, but not equal to the standard deviation. For Z, the mean absolute deviation is 2, and the standard deviation is 1.5. Although these parameters are different, we will often interpret the standard deviation as we would the mean absolute deviation.We simply need to realize that this interpretation of the standard deviation is only an approximate one.
Which measure of dispersion should we use? It is always good to compute more than one measure of dispersion as each gives you slightly different information. The range is often reported. The mean absolute deviation is an intuitive measure of dispersion, but it is not used much in practice, primarily because it is difficult to answer more advanced statistical questions using mean absolute deviation. Most of the statistical methods are based on the standard deviation. However, the range, mean absolute deviation, and standard deviation are all inflated by unusually large or unusually small values in the population. In this case, the IQR is often the best measure of dispersion.
Today on Education.com
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- 10 Fun Activities for Children with Autism
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- Social Cognitive Theory
- GED Math Practice Test 1
- The Homework Debate
- First Grade Sight Words List