Analyzing Categorical Data Study Guide (page 3)

Updated on Oct 5, 2011


  1. H0: p1j = p1, p2j = p2, ... , prj = pr, j = R, H, where R stands for renters and H represents home owners. The alternative hypothesis is Ha: not H0.
  2. The expected counts are in Table 20.5/p.

    Table 20.5 Expected counts

    Notice that the expected counts do not have to be, and are often not,whole numbers. They should not be rounded to whole numbers.

  3. A random sample was selected from each population. The smallest expected count was 12, which is greater than the minimum of 5 needed for the test statistic to have an approximate x2- distribution. Thus, the conditions for the test are satisfied. The value of the test statistic is
  4. The degrees of freedom associated with the test statistic are (r – 1)(c – 1) = 4 × 1 = 4. The largest number on the row with 4 degrees of freedom is 19.997, which is in the 0.0005 column. Therefore, the p-value is less than 0.0005.
  5. Because the p-value is so very small, the evidence against the null hypothesis is very strong. Thus, we reject it in favor of the alternative hypothesis.
  6. Evidence exists that the proportions in each age category is not the same for home owners and renters.

Tests of Independence

Sometimes, data on two categorical variables can be collected in one sample. For example, instead of sampling renters and home owners separately in the previous example, we could have taken one sample and asked each study participant whether he or she is a renter or a home owner and which age category he or she is in. The data could have been presented in a table as in the example. The only difference is the manner in which the data were collected. The null hypothesis is that the two variables are independent of one another, and the alternative is that they are not. The expected counts are then computed as in the tests for homogeneity. The conditions that must be satisfied are that the sample was randomly selected and that the expected counts are large enough, at least five in each cell. If these are satisfied, the test statistic,


has an approximate x2-distribution with (r – 1)(c – 1) degrees of freedom if the null hypothesis is true. The p-value is the probability that a randomly selected observation from the x2-distribution is greater than the test statistic. If this value is less than the specified significance level, the null hypothesis is rejected; otherwise, the null is not rejected.


A company wanted to assess the success of its television advertising campaign for a new product. They hired a pollster to find out whether those who saw the ad were more likely to have purchased the new product than those who had not. The pollster took a sample of 250 adults in the viewing area where the ad aired. Each study participant was asked whether he or she had seen the ad and whether he or she had purchased the new product. The results are presented in Table 20.7.

Table 20.7 The effect of the ad on sales

  1. State the null and alternative hypotheses of interest to the company.
  2. Find the expected counts.
  3. Verify the conditions for the test and, if satisfied, find the test statistic.
  4. Find the p-value.
  5. Decide whether or not to reject the null hypothesis.
  6. State your conclusion. Be sure it is in the context of the problem.


  1. H0: Having viewed the ad is independent of whether or not a person purchased the product.

    Ha: Having viewed the ad is not independent of whether or not a person purchased the product

  2. The expected counts are in Table 20.8.

    Table 20.8

  3. The sample was randomly selected from the population that had the opportunity to see the ad, and all expected counts exceed five. Thus the conditions for the test are satisfied. The test statistic is then

  4. If the null hypothesis is true, the test statistic has an approximate x2-distribution with (r - 1)(c – 1) = 1 × 1 = 1 degree of freedom. The smallest value in the row of the x2-table corresponding to one degree of freedom is 1.074 in the 0.3 column. Thus, the p-value is greater than 0.30.
  5. The p-value is large, indicating that data such as what was observed are not at all unusual if the null hypothesis is true. Therefore, we would not reject the null hypothesis.
  6. There is not sufficient evidence to reject the hypothesis that seeing the ad is independent of whether or not the new product was purchased. This would be frustrating information for the company's management. The lack of a significant relationship indicates that no sufficient evidence indicates that people were more likely to purchase the product after seeing the television ad. The company may be looking for a new advertisement firm!

Analyzing Categorical Data In Short

Categorical data lead to counts within each category. x2-tests are suitable for testing hypotheses about these data. When working with univariate categorical data, one can test whether the population proportions in each category are some set of specified values. If the same univariate categorical variable is observed in independent samples from two or more populations, one can test whether the proportions in each category are the same for all populations. If two different categorical variables are observed in one sample, the test concerns whether or not the two variables are independent.

Find practice problems and solutions for these concepts at Analyzing Categorical Data Practice Questions.

View Full Article
Add your own comment