Data Intervals
The following problems involve data intervals including quartiles, deciles, percentiles, and straight fractional portions. Let's consider an example involving climate change. (The following scenario is fictitious, and is for illustrative purposes only. It shouldn't be taken as actual history.)
The Distribution
Suppose you want to know if the average temperature in the world has increased over the last 100 years. You obtain climate data for many cities and towns scattered throughout the world. You are interested in one figure for each location: the average temperature over the course of last year, versus the average temperature over the course of the year one century ago. The term "a century ago" or "100 years earlier" means "100 years before this year (99 years before last year)."
In order to calculate a meaningful figure for any given locale, you compare last year's average temperature t, expressed in degrees Celsius (°C), with the average annual temperature s during the course of the year a century ago. You figure the temperature change, T, in degrees Celsius simply as follows:
T = t – s
If t < s, then T is negative, indicating that the temperature last year was lower than the temperature a century ago. If t = s, then T = 0, meaning that the temperature last year was the same as the temperature a century ago. If t > s, then T is positive, meaning that the temperature last year was higher than the temperature a century ago.
Now imagine that you have obtained data for so many different places that generating a table is impractical. Instead, you plot a graph of the number of locales that have experienced various average temperature changes between last year and a century ago, rounded off to the nearest tenth of a degree Celsius. Suppose the resulting smoothed-out curve looks like the graph of Fig. 8-10. We could generate a point-by-point graph made up of many short, straight line segments connecting points separated by 0.18°C on the horizontal scale, but that's not what we've done here. Instead, Fig. 8-10 is a smooth, continuous graph obtained by curve fitting.
Fig. 8-10. Illustration for Practice 1 through 7.
Practice 1
What do the points (–2,18) and (+2.8,7) on the graph represent?
Solution 1
These points tell us that there are 18 locales whose average annual temperatures were lower last year by 28°C as compared with a century ago, as shown by the point (–2,18), and that there are 7 locales whose average annual temperatures were higher last year by 2.88°C as compared with a century ago, as shown by the point (+2.8,7).
Practice 2
Suppose we are told that the temperature in the town of Thermington was higher last year by 1.38°C than it was a century ago. This fact is indicated by the vertical, dashed line in the graph of Fig. 8-10. Suppose L represents the proportion of the area under the curve (but above the horizontal axis showing temperatures) to the left of this dashed line, and R represents the proportion of the area under the curve to the right of the dashed line. What can be said about L and R?
Solution 2
The sum of L and R is equal to 1. If L and R are given as percentages, then L + R = 100%.
Practice 3
Suppose we are told that Thermington is exactly at the 81st percentile point in the distribution. What does this mean in terms of the areas of the regions L and R?
Solution 3
It means that L represents 81% of the area under the curve, and therefore that R represents 100% – 81%, or 19%, of the area under the curve.
Practice 4
Suppose we are told that Thermington is exactly at the 8th decile point in the distribution. What does this mean in terms of the areas of the regions L and R?
Solution 4
It means that L represents 8/10 of the area under the curve, and therefore that R represents 1 – 8/10, or 2/10, of the area under the curve.
Practice 5
Suppose we are told that Thermington is exactly at the 3rd quartile point in the distribution. What does this mean in terms of the areas of the regions L and R?
Solution 5
It means that L represents 3/4 of the area under the curve, and therefore that R represents 1–3/4, or 1/4, of the area under the curve.
Practice 6
Suppose we are told that Thermington is among the one-quarter of towns in the experiment that saw the greatest temperature increase between last year and 100 years ago. What does this mean in terms of the areas L and R?
Solution 6
The statement of this problem is ambiguous. It could mean either of two things:
- We are considering all the towns in the experiment.
- We are considering only those towns in the experiment that witnessed increases in temperature.
If we mean (1) above, then the above specification means that L represents more than 3/4 of the area under the curve, and therefore that R represents less than 1/4 of the area under the curve. If we mean (2) above, we can say that R represents less than 1/4, or 25%, of the area under the curve to the right of the vertical axis (the axis in the center of the graph showing the number of locations experiencing a given temperature change); but we can't say anything about L unless we know more about the distribution.
It looks like the curve in Fig. 8-10 is symmetrical around the vertical axis. In fact, it's tempting to think that the curve is a normal distribution. But we haven't been told that this is the case. We mustn't assume it without proof. Suppose we run the data through a computer and determine that the curve is a normal distribution. Then if (2) above is true, L represents more than 7/8 of the total area under the curve, and R represents less than 1/8 of the total area under the curve.
Practice 7
Suppose that the curve in Fig. 8-10 represents a normal distribution, and that Thermington happens to lie exactly one standard deviation to the right of the vertical axis. What can be said about the areas L and R in this case?
Solution 7
Imagine the "sister city" of Thermington, a town called Frigidopolis. Suppose Frigidopolis was cooler last year, in comparison to 100 years ago, by exactly the same amount that Thermington was warmer. This situation is shown graphically in Fig. 8-11. Because Thermington corresponds to a point (or dashed vertical line) exactly one standard deviation to the right of the vertical axis, Frigidopolis can be represented by a point (or dashed vertical line) exactly one standard deviation to the left of the vertical axis.
Fig. 8-11. Illustration for Solution 7.
The mean (μ) in Fig. 8-11 happens to coincide with the vertical axis, or the point where the temperature change is 0. Recall the empirical rule concerning standard deviation (σ) and normal distributions. We learned about this in Chapter 3. The empirical rule applies directly to this problem. The vertical dashed line on the left, representing Frigidopolis, is – σ from μ. The vertical dashed line on the right, representing Thermington, is +σ from μ. The empirical rule tells us that the proportion of the area under the curve between these two dashed lines is 68% of the total area under the curve, because these two dashed vertical lines represent that portion of the area within ± of μ. The proportion of the area under the curve between the vertical axis and either dashed line is half this, or 34%.
The fact that Fig. 8-11 represents a normal distribution also tells us that exactly 50% of the total area under the curve lies to the left of μ, and 50% of the area lies to the right of μ, if we consider the areas extending indefinitely to the left or the right. This is true because a normal distribution is always symmetrical with respect to the mean.
Knowing all of the above, we can determine that L = 50% + 34% = 84%. That means R = 100% – 84% = 16%.
More practice problems for these concepts can be found at:
Statistics Practical Problems Practice Test
View Full Article
From Statistics Demystified: A Self-Teaching Guide. Copyright © 2004 by The McGraw-Hill Companies. All Rights Reserved.