Introduction to Hypothesis Testing for Proportions
A confidence interval on the population proportion provides a set of plausible values for that proportion as we saw in the last lesson. If the proportion is hypothesized to be, say, 0.4, but the interval does not include 0.4, then it would be reasonable to reject that value for the population proportion. We will not always want to assess the validity of hypotheses using a confidence interval. In this lesson, the logic of statistical hypothesis testing and the application of this logic to proportions will be presented.
Logic of Hypothesis Testing
To conduct a statistical test of hypotheses, we must first have two hypotheses: the null hypothesis and the alternative hypothesis. The null hypothesis, denoted by H_{0}, is a statement that nothing is happening. The specific null hypothesis varies depending on the problem. It could be that medication does not alter blood pressure, that no relationship exists between IQ and grades, or that hair grows at the same rate for females and males. The alternative hypothesis, denoted by H_{a}, is a statement that something is happening. As with the null hypothesis, the alternative hypothesis depends on the problem. It could be that medication changes blood pressure, that a relationship does exist between IQ and grades, or that hair grows at different rates for females and males.
For a moment, consider a trial. The null hypothesis at every trial is H_{0}: The defendant is not guilty. That is, she did not do whatever she is accused of. The alternative hypothesis is always H_{a}: The defendant is guilty. That is, she did commit the crime of which she is accused. In the U.S. judicial system, the jury is instructed that the null hypothesis of not guilty can be rejected in favor of the alternative guilty only if such a conclusion can be drawn beyond a shadow of a doubt; the evidence must be strong enough that the null is not true. Statistical hypothesis testing is also firmly based on the idea of rejecting the null hypothesis only if there is strong evidence against it.
For any test of hypotheses, two types of errors are possible, type I errors and type II errors. A type I error occurs if the null hypothesis is rejected when it is true. For the trial example, a type I error is committed if the jury declares an innocent person guilty. A type II error occurs if the null hypothesis is not rejected, but it is false (the alternative is true). If a guilty person is declared innocent, then a type II error has been made in a jury trial.
In most hypothesis testing settings, we can never be absolutely sure whether the null or the alternative is true. Again thinking of the jury trial, we can never be certain whether she is innocent or guilty. (A confession may not be true; a witness may lie.) The null hypothesis that the person is innocent is rejected only if the evidence presented in the case makes jurors believe that the likelihood of having that much or more evidence against her is extremely unlikely if she is innocent. In statistical hypothesis testing, the pvalue is the probability of observing an outcome as unusual or more unusual than that that was observed given that the null hypothesis is true. If the pvalue becomes too small, then we reject the null hypothesis in favor of the alternative, just as the jurors would reject the hypothesis of innocence and conclude guilty. However, in the case of the statistical test, we have a number, the pvalue, which gets smaller as the evidence against the null hypothesis increases.
The significance level of a test, or the α level, is the borderline for deciding whether the pvalue is small enough to justify rejecting the null hypothesis in favor of the alternative hypothesis. If the pvalue is smaller than the significance level, the null hypothesis is rejected; otherwise, it is not. The significance level is the largest acceptable probability of a type I error.
The standard for rejecting the null hypothesis is quite high. For a jury to declare the defendant guilty, the evidence must be strong enough to remove any shadow of a doubt from the minds of the jurors. As a consequence, some guilty people will be found not guilty only because that shadow of the doubt remained. Similarly, if the pvalue is above the significance level and the null is not rejected, this does not mean that the null hypothesis is true; we simply do not have enough evidence to say it is not true. This is why, in statistical hypothesis testing, we say that we have "failed to reject the null hypothesis," but we would not conclude that the null hypothesis is true.
Example
A farmer wants to determine whether or not his field needs to be treated to control the number of insects in it. Because of costs and environmental concerns, he wants to be sure that treatment is required before proceeding. Based on this, answer the following questions.
 State the null and alternative hypotheses.
 How could the farmer make a type I error and what would the consequences be?
 How could the farmer make a type II error and what would the consequences be?
Solution
 Because the farmer wants to be certain that treatment is necessary before treating to control insects, this action (of treatment) is the alternative hypothesis. Thus, the null hypothesis is H_{0}: Do not treat the field, and the alternative hypothesis is H_{a}: Treat the field.
 A type I error would occur if the farmer treated the field when it should not have been treated (rejected a true null hypothesis). This would cause him to spend money unnecessarily on treatment, reducing his profits for the season. The potential for negative environmental impacts is also present.
 A type II error would occur if the farmer did not treat the field when it should have been treated (failed to reject a false null hypothesis). This would result in lower production and thus a reduction in profits.
Conducting Hypothesis Tests on Proportions
We will follow five steps in conducting a hypothesis test. Each of these steps will be discussed and applied to proportions in this section.
Step 1: Specifying the Hypotheses
In science, a research hypothesis is a specific, testable prediction made about outcomes of a study. The scientist hopes that the results of the study validate the prediction. When establishing a set (H_{0} and H_{a}) of statistical hypotheses, the research hypothesis is made the alternative hypothesis. To understand why, think back to the parallel we have been using with a jury trial. If the jury believes that strong evidence exists against the assumption of not guilty, then they reject that assumption and conclude that the defendant is guilty. It is not necessary to prove innocence, it is only necessary to raise doubt about guilt to conclude not guilty. Similarly, with statistical hypotheses, if we reject the null hypothesis, we accept the alternative hypothesis. If sufficient evidence does not exist to reject the null hypothesis, we do not accept the null; we fail to reject it, which is a much weaker conclusion.
When working with proportions, the null hypothesis is that the population proportion p is equal to some proportion p_{0}. The alternative may be that p is less than, greater than, or equal to p_{0}, depending on what the research hypothesis is.
Step 2: Verify Necessary Conditions for a Test and, if Satisfied, Construct the Test Statistic
The conditions for testing hypotheses about the population proportion p are the same as those for constructing a confidence interval on this parameter. They are (1) the sample was randomly selected and (2) the sample is sufficiently large, that is, np ≥ 10 and n(1 – p) ≥ 10.
By the Central Limit Theorem, if n is sufficiently large, the sample proportion is approximately normally distributed with mean p and standard deviation . Standardizing , we have is approximately standard normal. If the null hypothesis is true and p = p_{0}, the test statistic is approximately distributed as a standard normal random variable. Note: The test statistic is always constructed assuming that the null hypothesis is true.
Notice that the test statistic has the form .
The test statistics we will encounter in this book all have this form. They are standardized random variables whose distributions we know if the null hypothesis is true.
Step 3: Find the pValue Associated with the Test Statistic
If the null hypothesis is true, the test statistic has an approximate standard normal distribution. If the null hypothesis is not true, the test statistic is not distributed as an approximate standard normal and is more likely to assume a value that is "unusual" for a random observation from a standard normal. The p–value is the probability of determining the probability of observing a value as extreme or more extreme as z_{T} from a random selection of the standard normal distribution.
How do we measure how unusual a test statistic is? It depends on the alternative hypothesis. These are summarized in Figures 16.1, 16.2, and 16.3.
Step 4: Decide Whether or Not to Reject the Null Hypothesis
Before beginning the study, the significance level of the test is set. The significance level a is the largest acceptable probability of a type I error. If the pvalue is less than α, the null hypothesis is rejected; otherwise, the null is not rejected. In statistical hypothesis testing, we control the probability of a type I error. We often do not know the probability of a type II error. If we reject the null hypothesis, we know the probability of making an error. If we do not reject the null hypothesis, we do not know the probability of having made an error. This is why we would not accept the null hypothesis; we only fail to reject it. In science, a significance level of α = 0.05 is generally the standard. That is, we reject H_{0} if p < α = 0.05. A pvalue less than α = 0.01 is usually viewed as highly significant. However, the researcher can set the significance level that is most appropriate for his or her study. Once the decision is made to reject or not to reject the null hypothesis, it is important to state what conclusions have been drawn.
Step 5: State Conclusions in the Context of the Study
Statistical tests of hypotheses are conducted to determine whether or not sufficient evidence exists to reject the null hypothesis in favor of the alternative hypothesis. Once the decision is made to reject or not to reject the null hypothesis, it is important to state what conclusions have been drawn.
Example of Conducting Hypothesis Tests on Proportions
Example:
A sleep researcher believes that most (more than half) of all college students take naps during the afternoon or early evening. He randomly selects 60 students from a large university. He asks each selected student, "Do you regularly take naps during the afternoon or early evening?" Of the 60 students, 34 responded yes. Does sufficient statistical evidence exist to conclude that more than half of the students at this university regularly take afternoon or early evening naps?
Solution:
We will follow the five steps of hypothesis testing.
Step 1: Specifying the Hypotheses
The parameter of interest in the study is p, the proportion of students at this university who regularly take afternoon or early evening naps. The sleep researcher believes more than 50% of the students take afternoon or early evening naps regularly, so this is the alternative hypothesis. Thus, the set of hypotheses to be tested are:
H_{0}: p = 0.50
H_{a}: p > 0.50
Note: Equality appears in H_{0}. This is necessary to know the distribution of the test statistic under H_{0}. Also, a onesided alternative (p > 0.50) is used instead of a twosided alternative (p ≠ 0.50). The reason for this is that the sleep scientist wants to conclude that more than half of the students take naps, not that some proportion other than half take naps (the meaning of the twosided alternative here).
Step 2: Verify Necessary Conditions for a Test and, if Satisfied, Construct the Test Statistic
The sleep researcher took a random sample of students from those attending the university, so the first condition for inference is satisfied. To check the second condition, we have and . Thus, the second condition for inference is also satisfied. Note that
The test statistic is
.
Step 3: Find the pValue Associated with the Test Statistic
If the null hypothesis is true, the test statistic has an approximate standard normal distribution. The pvalue is the probability of determining the probability of observing a value as extreme or more extreme as z_{T} from a random selection of the standard normal distribution. For this study, there would have been more support for the alternative if the sample proportion of naptaking students had been greater than the observed = 0.0567. This would have led to a larger value of z_{T}. Thus, the pvalue is:
p 
= p(z > z_{T}) 

= p(z > 1.03) 

=1 – p(z ≤ 1.03) 

= 1 – 0.8485 

= 0.1515 
The 0.8485 was obtained from the standard normal table (see Figure 16.4).
Step 4: Decide Whether or Not to Reject the Null Hypothesis
If the null hypothesis were true, then we would expect to see a test statistic this extreme or more extreme about 15% of the time. This is not very unusual. Flipping a coin three times and obtaining all heads occurs less frequently (12.5%). If we use any traditional significance level, such as α = 0.05 or 0.01, then the pvalue would be greater than the significance level. For all of these reasons, we would not reject the null hypothesis.
Step 5: State Conclusions in the Context of the Study
There is not sufficient evidence to conclude that more than half of the students at this large university regularly take afternoon or evening naps. It is important to note that = 0.567 is greater than 50%, so the sample is consistent with the null hypothesis. However, there is a possibility that p = 0.50 and sampling variability caused to be this large. If = 0.567 is enough larger than 0.50 to be practically important, the researcher may choose to conduct another study using a larger sample size.
Hypothesis Testing For Proportions In Short
Statistical hypothesis testing is a fundamental tool in research today. The investigator takes the research hypothesis as the alternative hypothesis if at all possible. Two types of errors are possible. If a true null hypothesis is rejected, a type I error has been committed. If we do not reject a false null hypothesis, a type II error has been committed. The probability of a type I error is controlled. There are five steps to conducting a statistical hypothesis test. It is important to carefully complete each step.
Find practice problems and solutions for these concepts at Hypothesis Testing for Proportions Practice Exercises.