Education.com
Try
Brainzy
Try
Plus

# Histograms and Boxplots Study Guide (page 2)

(not rated)
By
Updated on Oct 5, 2011

## Grouped Discrete Data

When working with discrete data, each possible count is a natural category for a frequency distribution. Sometimes, numerous possible counts exist or a few large or small values are far away from most of the data. In these cases, a frequency distribution with a very long list of possible values does not aid understanding of the data. By grouping the observed values to form class intervals, or simply classes, greater insight into the data is often gained.

As an example,New Zealand's Tourism Research Council conducts an annual survey of international visitors. The length of stay of people traveling to New Zealand from other countries is collected. The results from the 2004 survey are displayed in Table 7.2 (Source: Tourism Research Council of New Zealand website at www.trcnz.govt.nz/Surveys/International+Visitor+Survey/Data+and+Analysis/).

Suppose we had listed each length of stay in days. We would have had more than 30 classes. By grouping, we have eight classes, and it is easy to see that 22% of the international visitors stayed less than five days and 16% stayed at least 30 days.However, we have lost some information. We do not know how many stayed for three days or how many stayed for more than 60 days.

## Tabular Displays of Continuous Data

The difficulty in displaying numerical data in tables becomes more pronounced when working with continuous numerical data. Consider the heights of the orchestra members discussed in Lessons 3 and 4. There were 62 orchestra members who had 52 different heights! Forming class intervals allows us to display the frequencies within each class. The challenge is that no natural intervals exist, so we have to define our own. Because the shortest orchestra member was 53.5 inches and the tallest 76.7 inches, it seems natural to begin the classes at 53 inches and stop them at 77 inches. The question is how long should each interval be? If we form intervals of 2 inches beginning at 53 inches,we would have 16 classes: 53—55, 55—57, 57—59, 59—61, 61—63, 63—65, 65—67, 67—69, 69—71, 71—73, 73— 75, 75—77, 77—79, 79—81, 81—83, 83—85. If we choose 4-inch intervals, we would have eight classes: 53—57, 57—61, 61—65, 65—69, 69—73, 73—77, 77—81, 81—85. For now, let's look at both of these possibilities and later decide which one would be best.

We have one more problem that must be addressed before we can complete the frequency table: What happens if an observation falls on the boundary? As an illustration, one orchestra member is 65.0 inches tall. Should that person be in the 63–65 or the 65–67 class interval? We will adopt the convention that the lower boundary but not the upper boundary is included in a class. For the 65.0-inch-tall orchestra member, this means that she will be counted in the 65–67 class interval.

For 2-inch intervals, we would have the frequency table shown here in Table 7.3.

Using the 4-inch class intervals, we obtain the frequency distribution shown in Table 7.4.

Before deciding whether to use the 2-inch or 4-inch class intervals, we will learn how to construct histograms.

## Histograms

First, suppose we have ungrouped discrete data as in the number of partners for the Gunnison's prairie dog example. Constructing a histogram requires the following steps:

1. On the horizontal axis, draw a scale, mark the possible values, and label the axis.
2. On the vertical axis, draw a scale, mark it with either frequencies or relative frequencies, and label the scale.
3. For each possible value, draw a rectangle centered at that value with a height determined by the corresponding frequency or relative frequency.

#### Example

Construct the relative frequency histogram for the number of partners for Gunnison's prairie dogs.

#### Solution

The relative frequency histogram is displayed in Figure 7.1.

When working with continuous data, class intervals must be formed, as in Tables 7.3 and 7.4, before a histogram can be constructed. Once this is done, the process of constructing a histogram is similar to that for discrete data. For the 2-inch intervals of orchestra members' heights presented in Table 7.2, the histogram is shown in Figure 7.2.

When looking at a histogram, you should look for a center value, the extent of spread or dispersion, the general shape, the location and number of peaks, and the presence of gaps and outliers. Here, perhaps the most notable feature is the three peaks at 60, 68, and 72 inches. These make determining the center, or typical, value a little difficult. The center of the data seems to be about 66 inches. This agrees well with the mean of 66.4 inches and the median of 66.85 inches found in Lesson 5. The spread seems to be from about 54 to 84 inches. An unusually small value and an unusually large value are separated from the rest of the observations by gaps.

In Figure 7.3, the class intervals are 4 inches wide (see Table 7.3). The center appears to be at about 67, still consistent with the mean and median found in Lesson 5. However, there are only two peaks now, one at about 59 inches and the other at about 67 inches. The shortest person no longer seems to be an outlier as no gap exists on the graph between that value and the next smallest value, but the tallest orchestra member continues to appear to be an unusual value.

If class intervals are made too small, the average number of observations in each interval is small and subject to quite a bit of variation. This results in fluctuations in the height of the bars that may simply reflect fluctuations in the data and not true distributional characteristics. In contrast, if the class intervals are too wide, important features of the data may be obscured. The histogram using 2-inch intervals appears to reflect the fluctuations in the data, whereas the one based on 4-inch intervals is more clearly reflecting the distributional characteristics. Thus, 4-inch intervals are the most appropriate.As a general rule, the number of classes is often set approximately to the square root of the number of data points.Using this rule, seven or eight classes would be about the right number as there are 62 orchestra members. This agrees with our choice of 4-inch intervals.

150 Characters allowed

### Related Questions

#### Q:

See More Questions

### Today on Education.com

Top Worksheet Slideshows