Other Specifications—Range
There are additional descriptive measures that can be used to describe the characteristics of data. Let's look at the definitions of some of them.
Range
In a data set, or in any contiguous (''all-of-a-piece'') interval in that set, the term range can be defined as the difference between the smallest value and the largest value in the set or interval.
In the graph of hypothetical blood-pressure test results (Fig. 4-1), the lowest systolic pressure in the data set is 60, and the highest is 160. Therefore, the range is the difference between these two values, or 100. It's possible that a few of the people tested have pressures lower than 60 or higher than 160, but their readings have been, in effect, thrown out of the data set.
In the 40-question test we've examined so often in this chapter, the lowest score is 0, and the highest score is 40. Therefore, the range is 40. We might want to restrict our attention to the range of some portion of all the scores, for example the 2nd lowest 25% of them. This range can be determined from Table 4-4; it is equal to 24 – 17, or 7. Note that the meaning of the word ''range'' in this context is different from the meaning of the word ''range'' at the top of the left-hand column of Table 4-4.
Coefficient of Variation
Do you remember the definitions of the mean (μ) and the standard deviation (σ)? Let's review them briefly. There's an important specification that can be derived from them.
In a normal distribution, such as the one that shows the results of our hypothetical blood-pressure data-gathering adventure, the mean is the value (in this case the blood pressure) such that the area under the curve is equal on either side of a vertical line corresponding to that value.
In tabulated data for discrete elements, the mean is the arithmetic average of all the results. If we have results {x_{1}, x_{2}, x_{3}, . . ., x_{n}} whose mean is μ, then the standard deviation is
σ = {(1/n)[(x_{1} – μ)^{2} + (x_{2} – μ)^{2} + . . . + (x_{n} – μ)^{2}]}^{1/2}
The mean is a measure of central tendency or ''centeredness.'' The standard deviation is a measure of dispersion or ''spread-outedness.'' Suppose we want to know how spread out the data is relative to the mean. We can get an expression for this if we divide the standard deviation by the mean. This gives us a quantity known as the coefficient of variation, which is symbolized as CV. Mathematically, the CV is:
The standard deviation and the mean are both expressed in the same units, such as systolic blood pressure or test score. Because of this, when we divide one by the other, the units cancel each other out, so the CV doesn't have any units. A number with no units associated with it is known as a dimensionless quantity.
Because the CV is dimensionless, it can be used to compare the ''spread-outedness'' of data sets that describe vastly different things, such as blood pressures and test scores. A large CV means that data is relatively spread out around the mean. A small CV means that data is concentrated closely around the mean. In the extreme, if CV = 0, all the data values are the same, and are exactly at the mean. Figure 4-6 shows two distributions in graphical form, one with a fairly low CV, and the other with a higher CV.
There is one potential difficulty with the above formula. Have you guessed it? If you wonder what happens in a distribution where the data can attain either positive or negative values – for example, temperatures in degrees Celsius – your concern is justified. If μ = 0 (the freezing point of water on the Celsius temperature scale), there's a problem. This trouble can be avoided by changing the units in which the data is specified, so that 0 doesn't occur within the set of possible values. When expressing temperatures, for example, we could use the Kelvin scale rather than the Celsius scale, where all temperature readings are above 0.
In a situation where all the elements in a data set are equal to 0, such as would happen if a whole class of students turns in blank papers on a test, the CV is undefined because the mean really is equal to 0.
Z Score
Sometimes you'll hear people say that such-and-such an observation or result is ''2.2 standard deviations below the mean'' or ''1.6 standard deviations above the mean.'' The Z score, symbolized z, is a quantitative measure of the position of a particular element with respect to the mean. The Z score of an element is equal to the number of standard deviations that the element differs from the mean, either positively or negatively.
For a specific element x in a data set, the value of z depends on both the mean (μ) and the standard deviation (σ) and can be found from this formula:
If x is below the mean, then z is a negative number. If x is above the mean, then z is a positive number. If x is equal to the mean, then z = 0.
In the graphical distributions of Fig. 4-6, z > 0 for the point x shown. This is true for both curves. We can't tell from the graph alone exactly what the Z score is for x with respect to either curve, but we can at least see that it's positive in both cases.
Interquartile Range
Sometimes it's useful to know the ''central half '' of the data in a set. The interquartile range, abbreviated IQR, is an expression of this. The IQR is equal to the value of the 3rd quartile point minus the value of the 1st quartile point. If a quartile point occurs between two integers, it can be considered as the average of the two integers (the smaller one plus 0.5).
Consider again the hypothetical 40-question test taken by 1000 students. The quartile points are shown in Fig. 4-2A.
Fig. 4-2A. At A, positions of quartiles in the test results described in the text.
The 1st quartile occurs between scores of 16 and 17; the 3rd quartile occurs between scores of 31 and 32. Therefore:
Descriptive Measures Other Specification Practice Problems
Practice 1
Suppose a different 40-question test is given to 1000 students, and the results are much more closely concentrated than those from the test depicted in Fig. 4-2A. How would the IQR of this test compare with the IQR of the previous test?
Fig. 4-2A. At A, positions of quartiles in the test results described in the text.
Solution 1
The IQR would be smaller, because the 1st and 3rd quartiles would be closer together.
Practice 2
Recall the empirical rule from the previous chapter. It states that all normal distributions have the following three characteristics:
- Approximately 68% of the data points are within the range ±σ of μ.
- Approximately 95% of the data points are within the range ±2σ of μ.
- Approximately 99.7% of the data points are within the range ±3σ of μ.
Re-state this principle in terms of Z scores.
Solution 2
As defined above, the Z score of an element is the number of standard deviations that the element departs from the mean, either positively or negatively. All normal distributions have the following three characteristics:
- Approximately 68% of the data points have Z scores between –1 and +1.
- Approximately 95% of the data points have Z scores between –2 and +2.
- Approximately 99.7% of the data points have Z scores between –3 and +3.
Practice problems for these concepts can be found at:
Descriptive Measures Practice Test
View Full Article
From Statistics Demystified: A Self-Teaching Guide. Copyright © 2004 by The McGraw-Hill Companies. All Rights Reserved.