Introduction to Confident Intervals
A distribution can provide us with generalized data about populations, but it doesn't tell us much about any of the individuals in the population. Confidence intervals give us a better clue as to what we can expect from individual elements taken at random from a population.
The Scenario
Suppose we live in a remote scientific research outpost where the electric generators only produce 90 volts. The reason for the low voltage is the fact that the generators are old, inefficient, and too small for the number of people assigned to the station. Funding has been slashed, so new generators can't be purchased. There are too many people and not enough energy. (Sound familiar?)
We find it necessary to keep the compound well-lit at night, regardless of whatever other sacrifices we have to make. So we need bright light bulbs. We have obtained data sheets for 550-watt sodium-vapor lamps designed for 120-volt circuits, and these sheets tell us how much current we can expect the lamps to draw at various voltages. Suppose we have obtained the graph in Fig. 5-8, so we have a good idea of how much current each bulb will draw from our generators that produce 90 volts. The estimate of the mean, μ*, is 3.600 amperes. There are some bulbs that draw a little more than 3.600 amperes, and there are some that draw a little less. A tiny proportion of the bulbs draw a lot more or less current than average.
If we pick a bulb at random, which is usually what happens when anybody buys a single item from a large inventory, how confident can we be that the current our lamp draws will be within a certain range either side of 3.600 amperes?
68% Confidence Interval
Imagine that our data sheets tell us the standard deviation of the distribution shown in Fig. 5-8 is 0.230 amperes. According to the empirical rule, which we learned about in Chapter 3, 68% of the elements in a sample have a parameter that falls within one standard deviation (± σ) of the mean μ for that parameter in a normal distribution. We don't know the actual standard deviation or the actual mean μ for the lamps in our situation, but we have estimates μ* and σ* that we can use to get a good approximation of a confidence interval.

In our situation, the parameter is the current drawn at 90 volts. Therefore, 68% of the bulbs can be expected to draw current that falls in a range equal to the estimate of the mean plus or minus one standard deviation (μ* ± σ*). In Fig. 5-8, this range is 3.370 amperes to 3.830 amperes. It is a 68% confidence interval because, if we select a single bulb, we can be 68% confident that it will draw current in the range between 3.370 and 3.830 amperes when we hook it up to our sputtering, antiquated 90-volt generator.
95% Confidence Interval
According to the empirical rule, 95% of the elements have a parameter that falls within two standard deviations of the mean for that parameter in a normal distribution. Again, we don't know the actual mean and standard deviation. We only have estimates of them, because the data is not based on tests of all the bulbs of this type that exist in the world. But we can use the estimates to get a good idea of the 95% confidence interval.
In our research-outpost scenario, 95% of the bulbs can be expected to draw current that falls in a range equal to the estimate of the mean plus or minus two standard deviations (μ* ± 2σ*). In Fig. 5-9, this range is 3.140 amperes to 4.060 amperes.

The 95% confidence interval is often quoted in real-world situations. You may hear that "there's a 95% chance that Ms. X will survive her case of cancer for more than one year," or that "there is a 95% probability that the eyewall of Hurricane Y will not strike Miami." If such confidence statements are based on historical data, we can regard them as reflections of truth. But the way we see them depends on where we are. If you have an inoperable malignant tumor, or if you live in Miami and are watching a hurricane prowling the Bahamas, you may take some issue with the use of the word 'confidence" when talking about your future. Statistical data can look a lot different to us when our own lives are on the line, as compared to the situation where we are in some laboratory measuring the currents drawn by light bulbs.
When to be Skeptical
In some situations, the availability of statistical data can affect the very event the data is designed to analyze or predict. Cancer and hurricanes don't care about polls, but people do!
If you hear, for example, that there is a "95% chance that Dr. J will beat Mr. H in the next local mayoral race," take it with a dose of skepticism. There are inherent problems with this type of analysis, because people's reactions to the publication of predictive statistics can affect the actual event. If broadcast extensively by the local media, a statement suggesting that Dr. J has the election "already won" could cause overconfident supporters of Dr. J to stay home on election day, while those who favor Mr. H go to the polls in greater numbers than they would have if the data had not been broadcast. Or it might have the opposite effect, causing supporters of Mr. H to stay home because they believe they'll be wasting their time going to an election their candidate is almost certain to lose.
99.7% Confidence Interval
The empirical rule also states that in a normal distribution, 99.7% of the elements in a sample have a parameter that falls within three standard deviations of the mean for that parameter. From this fact we can develop an estimate of the 99.7% confidence interval.
In our situation, 99.7% of the bulbs can be expected to draw current that falls in a range equal to the estimate of the mean plus or minus three standard deviations (μ* ± 3σ*). In Fig. 5-10, this range is 2.910 amperes to 4.290 amperes.

c% Confidence Interval
We can obtain any confidence interval we want, within reason, from a distribution when we have good estimates of the mean and standard deviation (Fig. 5-11). The width of the confidence interval, specified as a percentage c, is related to the number of standard deviations x either side of the mean in a normal distribution. This relationship takes the form of a function of x versus c.

When graphed for values of c ranging upwards of 50%, the function of x versus c for a normal distribution looks like the curve shown in Fig. 5-12. The curve "blows up" at c = 100%.

Inexactness and Impossibility
The foregoing calculations are never exact. There are two reasons for this.
First, unless the population is small enough so we can test every single element, we can only get estimates of the mean and standard deviation, never the actual values. This can be overcome by using good experimental practice when we choose our sample frame and/or samples.
Second, when the estimate of the standard deviation σ* is a sizable fraction of the estimate of the mean μ*, we get into trouble if we stray too many multiples of σ* either side of μ*. This is especially true as the parameter decreases. If we wander too far to the left (below μ*), we get close to zero and might even stumble into negative territory – for example, "predicting" that we could end up with a light bulb that draws less than no current! Because of this, confidence interval calculations work only when the span of values is a small fraction of the estimate of the mean. This is true in the cases represented above and by Figs. 5-8, 5-9, and 5-10. If the distribution were much flatter, or if we wanted a much greater degree of certainty, we would not be able to specify such large confidence intervals without modifying the formulas.



Sampling and Estimation Practice Problems
Practice 1
Suppose you set up 100 archery targets, each target one meter (1 m) in radius, and have thousands of people shoot millions of arrows at these targets from 10m away. Each time a person shoots an arrow at the target, the radius r at which the arrow hits, measured to the nearest millimeter (mm) from the exact center point P of the bull's eye, is recorded and is fed into a computer. If an arrow hits to the left of a vertical line L through P, the radius is given a negative value; if an arrow hits to the right of L, the radius is given a positive value as shown in Fig. 5-13. As a result, you get a normal distribution in which the mean value, μ, of r is 0. Suppose you use a computer to plot this distribution, with r on the horizontal axis and the number of shots for each value of r (to the nearest millimeter) on the vertical axis. Suppose you run a program to evaluate this curve, and discover that the standard deviation, σ, of the distribution is 150 mm. This is not just an estimate, but is an actual value, because you record all of the shots.

Fig. 5-13. Illustration for Practice 1.
If you take a person at random from the people who have taken part at the experiment and have him or her shoot a single arrow at a target from 10m away, what is the radius of the 68% confidence interval? The 95% confidence interval? The 99.7% confidence interval? What do these values mean? Assume there is no wind or other effect that would interfere with the experiment.
Solution 1
The radius of the 68% confidence interval is equal to σ, or 150 mm. This means that we can be 68% confident that our subject's shot will land within 150mm of the center point, P. The radius of the 95% confidence interval is 2σ, or 300 mm. This means we can be 95% sure that the arrow will land within 300mm of P. The 99.7% confidence interval is equal to 3σ, or 450 mm. This means we can be 99.7% sure that the arrow will land within 450mm of P.
Practice 2
Draw a graph of the distribution resulting from the experiment described in Practice 1, showing the 68%, 95%, and 99.7% confidence intervals.
Solution 2
You should get a curve that looks like the one shown in Fig. 5-14.

Fig. 5-14. Illustration for Solution 2.
Practice problems for these concepts can be found at:
Sampling and Estimation Practice Test
View Full Article
From Statistics Demystified: A Self-Teaching Guide. Copyright © 2004 by The McGraw-Hill Companies. All Rights Reserved.
Add your own comment