Chi-square Test for Independence for AP Statistics

By — McGraw-Hill Professional
Updated on Feb 4, 2011

Practice problems for these concepts can be found at:

A random sample of 400 residents of large western city are polled to determine their attitudes concerning the affirmative action admissions policy of the local university. The residents are classified according to ethnicity (white, black, Asian) and whether or not they favor the affirmative action policy. The results are presented in the following table.

We are interested in whether or not, in this population of 400 citizens, ethnicity and attitude toward affirmative action are related (note that, in this situations, we have one population and two categorical variables). That is, does knowledge of a person's ethnicity give us information about that person's attitude toward affirmative action? Another way of asking this is, "Are the variables independent in the population?" As part of a hypothesis test, the null hypothesis is that the two variables are independent, and the alternative is that they are not: H0: the variables are independent in the population vs. HA: the variables are not independent in the population. Alternatively, we could say H0: the variables are not related in the population vs. HA: the variables are related in the population.

The test statistic for the independence hypothesis is the same chi-square statistic we saw for the goodness-of-fit test:

For a two-way table, the number of degrees of freedom is calculated as (number of rows – 1)(number of columns – 1) = (r – 1)(c – 1). As with the goodness-of-fit test, we require that we are dealing with a random sample and that the number of expected values in each cell be at least 5 (or some texts say there are no empty cells and at least 80% of the cells have more than 5 expected values).

Calculation of the expected values for chi-square can be labor intensive, but is usually done by technology (see the next Calculator Tip for details). However, you should know how expected values are arrived at.

example (calculation of expected value): Suppose we are testing for independence of the variables (ethnicity and opinion) in the previous example. For the two-way table with the given marginal values, find the expected value for the cell marked "Exp."

solution: There are two ways to approach finding an expected value, but they are numerically equivalent and you can use either. The first way is to find the probability of being in the desired location by chance and then multiplying that value times the total in the table (as we found an expected value with discrete random variables). The probability of being in the "Black" row is and the probability of being in the "Do Not Favor" column is .

Assuming independence, the probability of being in "Exp" by chance is then

The second way is to argue, under the assumption that there is no relation between ethnicity and opinion, that we'd expect each cell in the "Do Not Favor" column to show the same proportion of outcomes. In this case, each row of the "Do Not Favor" column would contain Most of you will probably find this way easier.

The Χ2-test for independence can be summarized as follows.

example: A study of 150 cities was conducted to determine if crime rate is related to outdoor temperature. The results of the study are summarized in the following table:

Do these data provide evidence, at the 0.02 level of significance, that the crime rate is related to the temperature at the time of the crime?


Practice problems for these concepts can be found at:

Add your own comment

Ask a Question

Have questions about this article or topic? Ask
150 Characters allowed