Measures of Dispersion for Numerical Data Study Guide (page 3)
Introduction to Measures of Dispersion for Numerical Data
When describing a distribution, it is important to have some measure of where the middle is. With a little thought, we can see that, if we have a measure of the middle, we still need more information to describe the distribution fully. For example, are all of the population values the same or are they spread out over a range of values? If they are spread out, how should we measure the spread? Measuring the spread, or dispersion, of population or sample values is the focus of this lesson.
Population Measures of Dispersion
Two distributions can have the same middle but still look very different. As an illustration, consider the two populations presented graphically in the dotplots in Figure 6.1.
For each of the populations, the mean, median, and mode are all 15.Yet, they are different; one is more spread out than the other. The measures of dispersion are used to quantify the spread of the distribution. Range, interquartile range, mean absolute deviation, and standard deviation are four such measures that will be discussed in this lesson.
The range is the difference in the largest and smallest population values; it is the total spread in the population. In Figure 6.1, Y assumes values from 13 to 17, giving a range of 17 – 13 = 4. The range of X is 20 – 10 = 10. Because the range of X is larger than that of Y, more than twice as large in this example, X is more spread out; that is, X has a larger dispersion than Y.
Find the range of heights of the high school orchestra members that were first discussed in Lesson 4.
The tallest orchestra member is 83.8 inches tall, and the shortest is 53.5 inches tall. The range of the heights is 83.8 – 53.5 = 30.3 inches, more than 2.5 feet!
Although the range in the heights of orchestra members is large (30.3 inches), most of the orchestra members are much closer in height than this indicates. The range is a crude measure of the dispersion of the population distribution. For example, consider the dotplot for the population distribution of Z in Figure 6.2.
Notice the range of Z is 4 as is the range of Y in Figure 6.1. Both Y and Z have means, medians, and modes of 15. Yet, the distribution of Z appears to be more spread out than that of Y. We need another measure to capture this dispersion. We will explore two such measures, beginning with the interquartile range.
The range is greatly affected by exceptionally small or large values in a population. Instead of looking at the range of all population values, the interquartile range measures the spread in the middle half of the data. To find the interquartile range, we must first find the quartiles. The quartiles are the values that divide the population into fourths, just as the median divided the population in half. The first quartile separates the bottom 25% of the data from the top 75&% of the data. The second quartile separates the bottom 50% of the data from the top 50% of the data. But, wait. That is exactly what the median does! The second quartile and the median are different names for the same quantity. The third quartile separates the bottom 75% of the data from the top 25% of the data.
The first and third quartiles are found by separating the lower half of the population values from the upper half of the population values. If there is an odd number of population values, the median is excluded from both halves. The first quartile, denoted by Q1, is the median of the bottom half of the population. The third quartile, Q3, is the median of the top half of the population values. The interquartile range (IQR) is then:
IQR = Q3 – Q1
The IQR is the range or spread of the middle half of the population values.
Find the interquartile range of the population of orchestra members.
The median, 66.85 inches, divided the population into two halves, those members who are less than 66.85 inches tall and those who are greater than 66.85 inches tall. There are 31 members in each half. Because an odd number of values exists in the half, the median of the lower half is the = 16th value in the ordered set of the 31 lower values. Thus, the first quartile is Q1 = 62.5 inches. Similarly, the third quartile is the median of the 31 upper half values. The 16th value in the ordered set of upper values is Q3 = 69.5 inches. The interquartile range is now found to be
Q3 – Q1 = 69.5 – 62.5 = 7 inches
which is a value much smaller than the range. This is the spread in the middle half of the population of heights.
Mean Absolution Deviation
Before describing the next measure of dispersion, we need to define what is meant by a deviation. The quantity, (Xi – μ) , is the deviation of the ith population value from the population. By taking the absolute value, the deviation is a measure of how far the value is from the mean. For the orchestra members, the shortest person has a deviation of 53.5 – 66.4 = –12.9 inches. The negative sign indicates that this person's height is below the mean; that is, the shortest member's height is 12.9 inches below the mean height of all orchestra members. The member who is 74 inches tall has a deviation of 74 – 66.4 = 7.6 inches. He is 7.6 taller than the average height of all orchestra members. If we add all of the deviations together, we get zero.
Mean absolute deviation is the mean (or average) distance of the population values from the population mean; that is,
Now, (Xi – μ) is the deviation of the ith population value from the mean. The absolute deviation of the ith population value from the population mean is the distance of that value from the population mean, |Xi μ |. For the populations of X, Y, and Z, we have μ = 15. For a population value of 13, the distance of that value from the mean is |13 –15| = | –2 | =2. The mean absolute deviation is the population mean of these distances, E(|X – μ|).
The mean absolute deviation of the population distribution of Y is 0.75, and that of Z is 1.14.Notice that the greater dispersion in the distribution of Z as compared to Y is captured in the mean absolute deviation.
Find the mean absolute deviation of the heights of the 62 orchestra members. Interpret the mean absolute deviation in the context of the problem.
We begin by finding the deviation and the absolute deviation from the mean for each band member. The mean height was 66.4, so the deviation for a particular orchestra member is that member's height minus 66.4. The absolute deviation is the absolute value of the deviation, or how far the member's height is from the mean. These are given in Table 6.1.
The mean absolute deviation is the average of the absolute deviations. For the orchestra members, the mean absolute deviation is 4.32 inches. This means that, on average, an orchestra member's height is 4.32 inches from the population mean height of 66.4 inches.
Variance and Standard Deviation
Two more measures of dispersion are the variance and the standard deviation. The variance is the mean squared distance of the population values from the mean; that is,
Notice that we have expressed the variance as an expected value.Here, it represents the average squared deviation of a population value from the mean. For the population distribution of Y, displayed in Figure 6.1, the variance is
For the population distribution of Z, displayed in Figure 6.2, the variance is 2.2. The units associated with the variance are the square of the measurement units of the population values.As an illustration, if the population values are recorded in inches, as they are with the orchestra members' heights, the variance is in inches2. To obtain a measure of dispersion in the same units as the population values, the standard deviation is found to be the square root of the variance; that is, σ = √σ2.
Although not technically correct, the standard deviation is often described as being the average distance of a population value from the mean. For the population distribution of Y, the standard deviation is √1 = 1. For Z, displayed in Figure 6.2, the population standard deviation is √2.2 = 1.5. The mean absolute deviation and the standard deviation are the same for Y. This is very unusual.More commonly, the mean absolute deviation is close, but not equal to the standard deviation. For Z, the mean absolute deviation is 2, and the standard deviation is 1.5. Although these parameters are different, we will often interpret the standard deviation as we would the mean absolute deviation.We simply need to realize that this interpretation of the standard deviation is only an approximate one.
Which measure of dispersion should we use? It is always good to compute more than one measure of dispersion as each gives you slightly different information. The range is often reported. The mean absolute deviation is an intuitive measure of dispersion, but it is not used much in practice, primarily because it is difficult to answer more advanced statistical questions using mean absolute deviation. Most of the statistical methods are based on the standard deviation. However, the range, mean absolute deviation, and standard deviation are all inflated by unusually large or unusually small values in the population. In this case, the IQR is often the best measure of dispersion.
Find the variance and standard deviation of the orchestra members' heights.
Carrying more significant digits than reported earlier, the mean height is 66.3774194 inches. The population variance is that is, the variance of the orchestra members' heights is 29.88 inches2. The standard deviation is√29.8807804 = 5.47 inches.
Sample Measures of Dispersion
We now have several measures of the spread of a population distribution. The challenge is that we generally have only a sample of values from the population, so we are unable to compute the population parameters. However, when we are interested in quantifying the spread of a distribution, we can use these sample values to estimate the population range, interquartile range, mean absolute deviation, variance, and standard deviation.
Sample Range and Sample Interquartile Range
The sample range is the largest sample value minus the smallest sample value. Often, the largest and/or the smallest population values are relatively rare in the population. If a small to moderately sized sample is drawn, then it is unlikely that both the largest and the smallest population values occur in the sample. Consequently, the sample range tends to underestimate the population range.
The difference in the first and third sample quartiles is the sample interquartile range. The process of finding the sample quartiles is the same as the one we used to find the population quartiles. First, find the median, or second quartile, of the sample values. The first quartile is the median of the bottom half of the data, and the third quartile is the median of thhe upper half of the data.
Sample Mean Absolute Deviation
The sample mean absolute deviation, is
where is the sample mean and n is the sample size. Notice that the procedure for finding the sample mean absolute deviation is much like finding the population median absolute deviation. The differences are that the sample mean is used instead of the population mean, and only sample values are considered.
Sample Variance and Sample Standard Deviation
The sample variance, s2, is computed as
Intuitively,we would like to use n instead of (n–1) in this equation.However, if n instead of (n –1) is used, the sample variance is a little smaller, on average, than the population variance; that is, the estimate of the variance would be biased.Although the previous equation allows us to see how the sample variance relates to the population value, it is somewhat cumbersome to compute it. A more computationally friendly way to find the sample variance is
Look carefully at the previous equation. In computing , the sample values are squared and then added. For ,the sample values are added together and then squared. The variance and sample variance are always greater than or equal to zero, so if you compute one of these and get a negative number, you know that you have made an error and should go back and redo the computations. Finally, the sample standard deviation is the square root of the variance; that is, s = √s2.
For the blink data, find the sample range, the sample interquartile range, the sample median absolute deviation, the sample variance, and the sample standard deviation.
The largest value in the sample is 40 and the smallest is 13, so the sample range is 40 – 13 = 27.
There are 15 differences in the blink data set. The eighth value, 24, was found earlier to be the median. Because an odd number of sample values exists, the median is a sample value, and this value is excluded when finding the first and third quartiles. Seven values are below the median:
The median of these values is the fourth value, 17. This is the first sample quartile. There are also seven values above the median:
The median of these values is 29 and is the third sample quartile. The sample is then
The sample mean was found to be 19.7. Thus, the sample mean absolute deviation is
Notice that we reported the sample mean as 19.7, but when it came to computing the sample mean absolute deviation, we used 19.66666667. Why? Because we had only 15 sample values reported as whole numbers, we can claim, at most, one decimal of accuracy in estimating the population mean when using the sample mean, so we reported 19.7 instead of 19.6666667 or another number with even more decimal places.However, when we use this value in subsequent computations, it is best to carry as many digits as possible. Otherwise, the round-off error becomes larger in each step and can cause significant error in the final value. This is true for all computations that will be done in this workbook.
The sample variance is computed as
and the sample standard deviation is found to be
s = √62.20952 = 7.9.
That is, the sample variance is 62.2 blinks2, and the sample standard deviation is 7.9 blinks in two minutes.
Measures of Dispersion for Numerical Data In Short
Range, interquartile range, mean absolute deviation, variance, and standard deviation are all measures of the population dispersion. The range is the difference in the largest and smallest population values. The interquartile range is the difference in the third and first quartiles of the population values. The mean absolute deviation is the average distance of the population value from the population mean. The variance is the average squared distance of a population value from the mean, and the standard deviation is the square root of the variance. The sample range, sample interquartile range, sample mean absolute deviation, sample variance, and sample standard deviation are sample estimates of the corresponding population parameters.
Find practice problems and solutions for these concepts at Measures of Dispersion for Numerical Data Practice Exercises.
Today on Education.com
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- 10 Fun Activities for Children with Autism
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- The Homework Debate
- Social Cognitive Theory
- First Grade Sight Words List
- GED Math Practice Test 1