Practice problems for these concepts can be found at:

- Inference for Catergorial Data Multiple Choice Practice Problems for AP Statistics
- Inference for Catergorial Data Free Response Practice Problems for AP Statistics
- Inference for Catergorial Data Review Problems for AP Statistics
- Inference for Catergorial Data Rapid Review for AP Statistics

The following are the approximate percentages for the different blood types among white Americans: A: 40%; B: 11%; AB: 4%; O: 45%. A random sample of 1000 black Americans yielded the following blood type data: A: 270; B: 200; AB: 40; O: 490. Does this sample provide evidence that the distribution of blood types among black Americans differs from that of white Americans or could the sample values simply be due to sampling variation? This is the kind of question we can answer with the chi-square **goodness-of-fit** test. ("Chi" is the Greek letter *Χ* chi-square is, logically enough, *Χ*^{2}.) With the chi-square goodness-of-fit test, we note that there is one categorical variable (blood type) and one population (black Americans). In this chapter we will also encounter a situation in which there is one categorical variable measured across two populations (called a chi-square test for homogeneity of proportions) and a situation in which there are two categorical variables measured across a single population (called a chi-square test for independence).

To answer this question, we need to compare the **observed values** in the sample with the **expected values** we would get *if* the sample of black Americans really had the same distribution of blood types as white Americans. The values we need for this are summarized in the following table.

It appears that the numbers vary noticeably for types A and B, but not as much for types AB and O. The table can be rewritten as follows.

Before working through this problem, a note on symbolism. Often in this book, and in statistics in general, we use English letters for statistics (measurements from data) and Greek letters for parameters (population values). Hence, is a sample mean and μ is a population mean; *s* is a sample standard deviation and σ is a population standard deviation, etc. We follow this same convention in this chapter: we will use *Χ*^{2} when referring to a population value or to the name of a test and use X^{2} when referring to the chi-square statistic.

The chi-square statistic (X^{2}) calculates the squared difference between the observed and expected values relative to the expected value for each category. The X^{2} statistic is computed as follows:

The chi-square distribution is based on the number of degrees of freedom which equals, for the goodness-of-fit test, the number of categories minus 1 (df = n – 1). The X^{2} statistic follows approximately a unique chi-square distribution, assuming a random sample and a large enough sample, for each different number of degrees of freedom. The probability that a sample has a X^{2} value as large as it does can be read from a table of X^{2} critical values, or determined from a calculator. There is a X^{2} table in the back of this book and you will be supplied a table like this on the AP exam. We will demonstrate both the use of tables and the calculator in the examples and problems that follow.

A hypothesis test for Χ2 goodness-of-fit follows the, by now familiar, pattern. The essential parts of the test are summarized in the following table.

Let's use the four-step hypothesis-testing procedure.

example:The following are the approximate percentages for the different blood types among white Americans: A: 40%; B: 11%; AB: 4%; O: 45%. A random sample of 1000 black Americans yielded the following blood type data: A: 270; B: 200; AB: 40; O: 490. Does this sample indicate that the distribution of blood types among black Americans differs from that of white Americans?

solution:

example:The statistics teacher, Mr. Hinders, used his calculator to simulate rolling a die 96 times and storing the results in a list L1. He did this by entering MATH PRB randInt(1,6,96)→(L1). Next he sorted the list (STAT SortA(L1)). He then counted the number of each face value. The results were as follows (this is called a one-way table).

Does it appear that the teacher's calculator is simulating a fair die? (That is, are the observations consistent with what you would expect to get __if__ the die were fair?)

**solution:**

Practice problems for these concepts can be found at:

### Ask a Question

Have questions about this article or topic? Ask### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Signs Your Child Might Have Asperger's Syndrome
- Definitions of Social Studies
- A Teacher's Guide to Differentiating Instruction
- Curriculum Definition
- What Makes a School Effective?
- Theories of Learning
- Child Development Theories