Education.com
Try
Brainzy
Try
Plus

Measures of Dispersion for Numerical Data Study Guide (page 3)

based on 1 rating
By
Updated on Oct 5, 2011

Example

Find the variance and standard deviation of the orchestra members' heights.

Solution

Carrying more significant digits than reported earlier, the mean height is 66.3774194 inches. The population variance is that is, the variance of the orchestra members' heights is 29.88 inches2. The standard deviation is√29.8807804 = 5.47 inches.

;

Sample Measures of Dispersion

We now have several measures of the spread of a population distribution. The challenge is that we generally have only a sample of values from the population, so we are unable to compute the population parameters. However, when we are interested in quantifying the spread of a distribution, we can use these sample values to estimate the population range, interquartile range, mean absolute deviation, variance, and standard deviation.

Sample Range and Sample Interquartile Range

The sample range is the largest sample value minus the smallest sample value. Often, the largest and/or the smallest population values are relatively rare in the population. If a small to moderately sized sample is drawn, then it is unlikely that both the largest and the smallest population values occur in the sample. Consequently, the sample range tends to underestimate the population range.

The difference in the first and third sample quartiles is the sample interquartile range. The process of finding the sample quartiles is the same as the one we used to find the population quartiles. First, find the median, or second quartile, of the sample values. The first quartile is the median of the bottom half of the data, and the third quartile is the median of thhe upper half of the data.

Sample Mean Absolute Deviation

The sample mean absolute deviation, is

.

where is the sample mean and n is the sample size. Notice that the procedure for finding the sample mean absolute deviation is much like finding the population median absolute deviation. The differences are that the sample mean is used instead of the population mean, and only sample values are considered.

Sample Variance and Sample Standard Deviation

The sample variance, s2, is computed as

.

Intuitively,we would like to use n instead of (n–1) in this equation.However, if n instead of (n –1) is used, the sample variance is a little smaller, on average, than the population variance; that is, the estimate of the variance would be biased.Although the previous equation allows us to see how the sample variance relates to the population value, it is somewhat cumbersome to compute it. A more computationally friendly way to find the sample variance is

.

Look carefully at the previous equation. In computing , the sample values are squared and then added. For ,the sample values are added together and then squared. The variance and sample variance are always greater than or equal to zero, so if you compute one of these and get a negative number, you know that you have made an error and should go back and redo the computations. Finally, the sample standard deviation is the square root of the variance; that is, s = √s2.

Example

For the blink data, find the sample range, the sample interquartile range, the sample median absolute deviation, the sample variance, and the sample standard deviation.

Solution

The largest value in the sample is 40 and the smallest is 13, so the sample range is 40 – 13 = 27.

There are 15 differences in the blink data set. The eighth value, 24, was found earlier to be the median. Because an odd number of sample values exists, the median is a sample value, and this value is excluded when finding the first and third quartiles. Seven values are below the median:

The median of these values is the fourth value, 17. This is the first sample quartile. There are also seven values above the median:

The median of these values is 29 and is the third sample quartile. The sample is then

.

The sample mean was found to be 19.7. Thus, the sample mean absolute deviation is

.

Notice that we reported the sample mean as 19.7, but when it came to computing the sample mean absolute deviation, we used 19.66666667. Why? Because we had only 15 sample values reported as whole numbers, we can claim, at most, one decimal of accuracy in estimating the population mean when using the sample mean, so we reported 19.7 instead of 19.6666667 or another number with even more decimal places.However, when we use this value in subsequent computations, it is best to carry as many digits as possible. Otherwise, the round-off error becomes larger in each step and can cause significant error in the final value. This is true for all computations that will be done in this workbook.

The sample variance is computed as

,

and the sample standard deviation is found to be

s = √62.20952 = 7.9.

That is, the sample variance is 62.2 blinks2, and the sample standard deviation is 7.9 blinks in two minutes.

Measures of Dispersion for Numerical Data In Short

Range, interquartile range, mean absolute deviation, variance, and standard deviation are all measures of the population dispersion. The range is the difference in the largest and smallest population values. The interquartile range is the difference in the third and first quartiles of the population values. The mean absolute deviation is the average distance of the population value from the population mean. The variance is the average squared distance of a population value from the mean, and the standard deviation is the square root of the variance. The sample range, sample interquartile range, sample mean absolute deviation, sample variance, and sample standard deviation are sample estimates of the corresponding population parameters.

Find practice problems and solutions for these concepts at Measures of Dispersion for Numerical Data Practice Exercises.

View Full Article
Add your own comment