Practice problems for these concepts can be found at:

- One-Variable Data Analysis Multiple Choice Practice Problems for AP Statistics
- One-Variable Data Analysis Free Response Practice Problems for AP Statistics
- One-Variable Data Analysis Review Problems for AP Statistics
- One-Variable Data Analysis Rapid Review for AP Statistics

Our purpose in drawing a graph of data is to get a visual sense of it. We are interested in the **shape** of the data as well as **gaps** in the data, **clusters** of datapoints, and **outliers** (which are datapoints that lie well outside of the general pattern of the data).

### Shape

When we describe **shape**, what we are primarily interested in is the extent to which the graph appears to be **symmetric** (has symmetry around some axis), **mound-shaped (bell-shaped)**, skewed (data are skewed to the left if the tail is to the left; to the right if the tail is to the right), **bimodal** (has more than one location with many scores), or **uniform** (frequencies of the various values are more-or-less constant).

There are four types of graph we want to look at in order to help us understand the shape of a distribution: dotplot, stemplot, histogram, and boxplot. We use the following 31 scores from a 50-point quiz given to a community college statistics class to illustrate the first three plots (we will look at a boxplot in a few pages):

### Dotplot

A **dotplot** is a very simple type of graph that involves plotting the data values, with dots, above the corresponding values on a number line. A dotplot of the scores on the statistics quiz, drawn by a statistics computer package, looks like this:

[Calculator note: Most calculators do not have a built-in function for drawing dotplots. There are work-arounds that will allow you to draw a boxplot on a calculator, but they involve more effort than they worth.]

### Stemplot (Stem and Leaf Plot)

A stemplot is a bit more complicated than a dotplot. Each data value has a *stem* and a *leaf*. There are no mathematical rules for what constitutes the stem and what constitutes the *leaf*. Rather, the nature of the data will suggest reasonable choices for the stem and leaves. With the given score data, we might choose the first digit to be the stem and the second digit to be the *leaf*. So, the number 42 in a stem and leaf plot would show up as 4 | 2. All the leaves for a common stem are often on the same line. Often, these are listed in increasing order, so the line with stem 4 could be written as: 4 | 0112236. The complete stemplot of the quiz data looks like this:

Using the 10's digit for the stem and the units digit for the leaf made good sense with this data set; other choices make sense depending on the type of data. For example, suppose we had a set of gas mileage tests on a particular car (e.g., 28.3, 27.5, 28.1,…). In this case, it might make sense to make the stems the integer part of the number and the leaf the decimal part. As another example, consider measurements on a microscopic computer part (0.0018, 0.0023, 0.0021,…). Here you'd probably want to ignore the 0.00 (since that doesn't help distinguish between the values) and use the first nonzero digit as the stem and the second nonzero digit as the leaf.

Some data lend themselves to breaking the stem into two or more parts. For these data, the stem "4" could be shown with leaves broken up 0–4 and 5–9. Done this way, the stemplot for the scores data would look like this (there is a single "1" because there are no leaves with the values 0–4 for a stem of 1; similarly, there is only one "5" since there are no values in the 55–59 range.):

The visual image are of data that are slightly skewed to the right (that is, toward the higher scores). We do notice a *cluster* of scores in the high 20s that was not obvious when we used an increment of 10 rather than 5. There is no hard and fast rule about how to break up the stems—it's easy to try different arrangements on most computer packages.

Sometimes plotting more than one stemplot, side-by-side or back-to-back, can provide us with comparative information. The following stemplot shows the results of two quizzes given for this class (one of them the one discussed above):

It can be seen from this comparison that the scores on Quiz #1 (on the left) were generally higher than for those on Quiz #2—there are a lot more scores at the upper end. Although both distributions are reasonably symmetric, the one on the left is skewed somewhat toward the smaller scores, and the one on the right is skewed somewhat toward the larger numbers.

[Note: Most calculators do not have a built-in function for drawing stemplots. However, most computer programs do have this ability and it's quite easy to experiment with various stem increments.]

Practice problems for these concepts can be found at:

### Ask a Question

Have questions about this article or topic? Ask### Related Questions

#### Q:

#### Q:

#### Q:

#### Q:

### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Child Development Theories
- Social Cognitive Theory
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- Signs Your Child Might Have Asperger's Syndrome
- Theories of Learning
- A Teacher's Guide to Differentiating Instruction
- Definitions of Social Studies