Confidence Intervals for Proportions Study Guide

Updated on Oct 5, 2011

Introduction to Confidence Intervals for Proportions

Studies are conducted and samples are drawn to learn more about one or more populations. If the form and parameters of the population distribution are known, there would be no need to sample. Sampling gives us information on the parameters of the distribution, but without a census, the population parameters cannot be determined exactly. The statistic estimating the parameter is rarely equal to the parameter. How close is the statistic to the parameter it is estimating? Can statements be made that an estimate is within a certain distance of the parameter with a known probability of the statement being correct? We will answer these questions during this lesson.

Confidence Intervals for Proportions

Based on the Gallup poll on patriotism mentioned in the previous lesson, the proportion of U.S. adults who identify themselves as extremely or very patriotic is 0.72, and the margin of error was 0.03. Using this, we obtained an interval of values, from 0.69 to 0.75, that were plausible for the proportion of the U.S. adult population who characterize themselves as extremely or very patriotic. The interval 0.69 to 0.75 constituted a 95% confidence interval of the true population proportion. How was the margin of error computed? What do we do if we want some confidence level other than 95%? We will now outline the process of finding a confidence interval for a population proportion so that we can answer these questions.

Suppose the goal of a study is to estimate the population proportion p with (1 – α) 100% confidence. Provided that the sample size is sufficiently large (i.e., np ≥ 10 and n(1 – p) ≥ 10), a confidence interval of the form z* will have the desired level of confidence if z* is chosen so that P(– z* < z < z*) = 1 – α. To see this, we will have to put together several ideas from earlier lessons with the new concept of a confidence interval.

First, by the Central Limit Theorem, we know that, if the sample size n is sufficiently large, the sample proportion is approximately normally distributed with mean p and standard deviation . Because p is unknown, the standard deviation of is unknown, but the standard error of , provides an estimate of the standard deviation.

Using the properties of the normal distribution and again assuming the sample size is large enough, we can transform to an approximate standard normal random variable z by subtracting the mean and dividing by the standard error; that is:

Note that we have divided by the standard error of instead of the standard deviation of . How large must the sample size be for the normal approximation to be adequate? The guidelines are the same as those we had for invoking the Central Limit Theorem in Lesson 13. If np ≥ 10 and n(1 – p)≥ 10, the normal distribution provides a good approximation of the distribution of . Because p is unknown, we use to check the conditions.

We also know that, for a standard normal distribution, we can find z* such that a specified percentage of the population values are between – z* and z*; the probability that a randomly selected value of z will be between – z* and z* is the confidence level. Figure 15.1 illustrates the relationship in z* and confidence level.

Figure 15.1

Table 15.1 provides z* values for the most common levels of confidence.

Table 15-1 z-values

Combining all of the above ideas, we have:

= 1 – α

Using algebra, we can rewrite the equation as:

= 1 – α

The limits on the inequality are the same as the confidence interval limits that we stated previously. Notice that the form of the confidence interval is point estimate multiplier × standard error.

For proportions, the point estimate is , the multiplier is the value of z* corresponding to the desired confidence level, and the standard error is . This general form will be seen again when we set confidence intervals on the mean.

Although it has been stated several times before, it is important to remember that the population proportion p is fixed. The confidence interval depends on the sample, and the limits of the confidence interval vary with the sample. To illustrate this, suppose we repeatedly draw samples of n = 50 from a population with p = 0.6. A confidence interval on p is found for each sample. The results of the confidence intervals from 100 of these samples are shown in Figure 15.2. The line segments represent the confidence intervals. Notice that four of the segments do not cross the vertical line at p = 0.6. That means 96 of the 100, or 96%, do include the population proportion 0.6. This is close to the predicted 95%. As the number of samples gets larger, the observed percentage will tend to get closer to the specified confidence level of 95%. Notice, the ends of the confidence intervals (the confidence limits) change with the sample, but the population proportion does not change.

Figure 15.2

View Full Article
Add your own comment