Education.com
Try
Brainzy
Try
Plus

Graphical Analysis for AP Statistics

By — McGraw-Hill Professional
Updated on Feb 5, 2011

Practice problems for these concepts can be found at:

Our purpose in drawing a graph of data is to get a visual sense of it. We are interested in the shape of the data as well as gaps in the data, clusters of datapoints, and outliers (which are datapoints that lie well outside of the general pattern of the data).

Shape

When we describe shape, what we are primarily interested in is the extent to which the graph appears to be symmetric (has symmetry around some axis), mound-shaped (bell-shaped), skewed (data are skewed to the left if the tail is to the left; to the right if the tail is to the right), bimodal (has more than one location with many scores), or uniform (frequencies of the various values are more-or-less constant).

Graphical Analysis

There are four types of graph we want to look at in order to help us understand the shape of a distribution: dotplot, stemplot, histogram, and boxplot. We use the following 31 scores from a 50-point quiz given to a community college statistics class to illustrate the first three plots (we will look at a boxplot in a few pages):

Dotplot

A dotplot is a very simple type of graph that involves plotting the data values, with dots, above the corresponding values on a number line. A dotplot of the scores on the statistics quiz, drawn by a statistics computer package, looks like this:

[Calculator note: Most calculators do not have a built-in function for drawing dotplots. There are work-arounds that will allow you to draw a boxplot on a calculator, but they involve more effort than they worth.]

Stemplot (Stem and Leaf Plot)

A stemplot is a bit more complicated than a dotplot. Each data value has a stem and a leaf. There are no mathematical rules for what constitutes the stem and what constitutes the leaf. Rather, the nature of the data will suggest reasonable choices for the stem and leaves. With the given score data, we might choose the first digit to be the stem and the second digit to be the leaf. So, the number 42 in a stem and leaf plot would show up as 4 | 2. All the leaves for a common stem are often on the same line. Often, these are listed in increasing order, so the line with stem 4 could be written as: 4 | 0112236. The complete stemplot of the quiz data looks like this:

Using the 10's digit for the stem and the units digit for the leaf made good sense with this data set; other choices make sense depending on the type of data. For example, suppose we had a set of gas mileage tests on a particular car (e.g., 28.3, 27.5, 28.1,…). In this case, it might make sense to make the stems the integer part of the number and the leaf the decimal part. As another example, consider measurements on a microscopic computer part (0.0018, 0.0023, 0.0021,…). Here you'd probably want to ignore the 0.00 (since that doesn't help distinguish between the values) and use the first nonzero digit as the stem and the second nonzero digit as the leaf.

Some data lend themselves to breaking the stem into two or more parts. For these data, the stem "4" could be shown with leaves broken up 0–4 and 5–9. Done this way, the stemplot for the scores data would look like this (there is a single "1" because there are no leaves with the values 0–4 for a stem of 1; similarly, there is only one "5" since there are no values in the 55–59 range.):

The visual image are of data that are slightly skewed to the right (that is, toward the higher scores). We do notice a cluster of scores in the high 20s that was not obvious when we used an increment of 10 rather than 5. There is no hard and fast rule about how to break up the stems—it's easy to try different arrangements on most computer packages.

Sometimes plotting more than one stemplot, side-by-side or back-to-back, can provide us with comparative information. The following stemplot shows the results of two quizzes given for this class (one of them the one discussed above):

It can be seen from this comparison that the scores on Quiz #1 (on the left) were generally higher than for those on Quiz #2—there are a lot more scores at the upper end. Although both distributions are reasonably symmetric, the one on the left is skewed somewhat toward the smaller scores, and the one on the right is skewed somewhat toward the larger numbers.

[Note: Most calculators do not have a built-in function for drawing stemplots. However, most computer programs do have this ability and it's quite easy to experiment with various stem increments.]

Practice problems for these concepts can be found at:

Add your own comment

Ask a Question

Have questions about this article or topic? Ask
Ask
150 Characters allowed