Confidence Intervals for Proportions Study Guide (page 2)
Introduction to Confidence Intervals for Proportions
Studies are conducted and samples are drawn to learn more about one or more populations. If the form and parameters of the population distribution are known, there would be no need to sample. Sampling gives us information on the parameters of the distribution, but without a census, the population parameters cannot be determined exactly. The statistic estimating the parameter is rarely equal to the parameter. How close is the statistic to the parameter it is estimating? Can statements be made that an estimate is within a certain distance of the parameter with a known probability of the statement being correct? We will answer these questions during this lesson.
Confidence Intervals for Proportions
Based on the Gallup poll on patriotism mentioned in the previous lesson, the proportion of U.S. adults who identify themselves as extremely or very patriotic is 0.72, and the margin of error was 0.03. Using this, we obtained an interval of values, from 0.69 to 0.75, that were plausible for the proportion of the U.S. adult population who characterize themselves as extremely or very patriotic. The interval 0.69 to 0.75 constituted a 95% confidence interval of the true population proportion. How was the margin of error computed? What do we do if we want some confidence level other than 95%? We will now outline the process of finding a confidence interval for a population proportion so that we can answer these questions.
Suppose the goal of a study is to estimate the population proportion p with (1 – α) 100% confidence. Provided that the sample size is sufficiently large (i.e., np ≥ 10 and n(1 – p) ≥ 10), a confidence interval of the form z* will have the desired level of confidence if z* is chosen so that P(– z* < z < z*) = 1 – α. To see this, we will have to put together several ideas from earlier lessons with the new concept of a confidence interval.
First, by the Central Limit Theorem, we know that, if the sample size n is sufficiently large, the sample proportion is approximately normally distributed with mean p and standard deviation . Because p is unknown, the standard deviation of is unknown, but the standard error of , provides an estimate of the standard deviation.
Using the properties of the normal distribution and again assuming the sample size is large enough, we can transform to an approximate standard normal random variable z by subtracting the mean and dividing by the standard error; that is:
Note that we have divided by the standard error of instead of the standard deviation of . How large must the sample size be for the normal approximation to be adequate? The guidelines are the same as those we had for invoking the Central Limit Theorem in Lesson 13. If np ≥ 10 and n(1 – p)≥ 10, the normal distribution provides a good approximation of the distribution of . Because p is unknown, we use to check the conditions.
We also know that, for a standard normal distribution, we can find z* such that a specified percentage of the population values are between – z* and z*; the probability that a randomly selected value of z will be between – z* and z* is the confidence level. Figure 15.1 illustrates the relationship in z* and confidence level.
Table 15.1 provides z* values for the most common levels of confidence.
Combining all of the above ideas, we have:
= 1 – α
Using algebra, we can rewrite the equation as:
= 1 – α
The limits on the inequality are the same as the confidence interval limits that we stated previously. Notice that the form of the confidence interval is point estimate multiplier × standard error.
For proportions, the point estimate is , the multiplier is the value of z* corresponding to the desired confidence level, and the standard error is . This general form will be seen again when we set confidence intervals on the mean.
Although it has been stated several times before, it is important to remember that the population proportion p is fixed. The confidence interval depends on the sample, and the limits of the confidence interval vary with the sample. To illustrate this, suppose we repeatedly draw samples of n = 50 from a population with p = 0.6. A confidence interval on p is found for each sample. The results of the confidence intervals from 100 of these samples are shown in Figure 15.2. The line segments represent the confidence intervals. Notice that four of the segments do not cross the vertical line at p = 0.6. That means 96 of the 100, or 96%, do include the population proportion 0.6. This is close to the predicted 95%. As the number of samples gets larger, the observed percentage will tend to get closer to the specified confidence level of 95%. Notice, the ends of the confidence intervals (the confidence limits) change with the sample, but the population proportion does not change.
A company wants to know what proportion of the bell pepper seeds it sells will germinate. One hundred seeds are randomly selected from the company's inventory. They are placed in ideal conditions for germination. After two weeks, 78 of the seeds had germinated. Set a 90% confidence interval for the proportion of seeds in inventory that would germinate under ideal conditions. Interpret the interval in the context of the problem.
First, we need to determine whether the conditions are satisfied for us to use a normal approximation to find the interval.
n = 100(0.78) = 78 > 10 and n( 1 – ) = 100(1 – 0.78) = 12 > 10
Because both n and n( 1 – ) are greater than 10, we can use a large-sample confidence interval based on the normal distribution.
Second, we know that the standard error of is . This would have been true even if the sample size had not been large enough to use a large-sample confidence interval.
Next, we need to find z* such that P( –z* < z < z*) = 0.90. From the table of common z* values provided earlier, z* = 1.645 is the multiplier for a 90% confidence interval.
The limits of the confidence interval are then 0.78 1.645 or 0.78 0.22 The confidence interval is (0.56,1.00).
Interpretation: We are 90% confident that between 56% and 100% of the bell pepper seeds in this company's inventory would germinate under ideal conditions.
Sample Sizes, Confidence Level, and Length of Confidence Intervals
Recall that the general form of a confidence interval is point estimate multiplier × standard error. The length of the confidence interval is 2 × multiplier × standard error; the half-length of the confidence interval is multiplier × standard error. From Lesson 13, we know that as the sample size increases, the standard error decreases, so the length of the confidence interval decreases. Notice from the table of z* values that the value of the multiplier increases as the confidence level increases, making the confidence interval longer. The fact that the confidence interval gets longer as the confidence level increases holds for other forms of intervals as well.
In the last example, we set a 90% confidence interval on the proportion of seeds in inventory that would germinate under ideal conditions. Now set a 95% confidence interval on this same proportion. Compare the two intervals.
Changing the confidence level has no effect on whether or not the conditions for inference are satisfied. We still have n = 100(0.78) = 78 > 10 and n( 1 – ) = 100(1 – 0.78) = 12 > 10 so we can use the normal distribution to approximate the sampling distribution of . Because the sample size has not changed, the standard error is the same. However, the multiplier z* is different. We must have 0.025 probability in each tail instead of the 0.05 in each tail that corresponded to a 90% confidence interval. From the table, we have z* = 1.96. Thus, the 95% confidence interval is 0.78 1.96 or 0.78 0.26 compared to the 90% confidence interval of 0.78 0.22 found earlier. Clearly, the 95% confidence interval is wider than the 90% confidence interval. It makes sense that, as the interval increases in length, we become more confident that the interval will capture the true population proportion.
We should also note that the 95% confidence interval ranges from 0.52 to 1.04. However, p cannot be greater than 1. Therefore, rounding the upper limit to the largest admissible value, we would say that we are 95% confident that the proportion of bell pepper seeds in inventory that would sprout under optimal conditions is between 56 and 100%. Alternately, we might say with 95% confidence that at least 56% of the bell pepper seeds in inventory will sprout under optimal conditions.
Confidence Intervals for Proportions In Short
Confidence intervals provide a plausible set of values for the unknown population parameter of interest. Appealing either to normality or the Central Limit Theorem, confidence intervals have the form point estimate multiplier × standard error.
For proportions, the point estimate is , and the standard error is . The multiplier z* is chosen so that the interval has the desired level of confidence. Relationships exist among the length of the confidence interval, the sample size, and the level of confidence.
Find practice problems and solutions for these concepts at Confidence Intervals for Proportions Practice Questions.
- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Definitions of Social Studies
- Grammar Lesson: Complete and Simple Predicates
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- How to Practice Preschool Letter and Name Writing
- Social Cognitive Theory
- Theories of Learning