# Histograms and Boxplots Study Guide

## Introduction to Histograms and Boxplots

Numerical data may be discrete or continuous. In this lesson, we will discuss presenting information on the distributions of discrete and continuous random variables in tabular form. Then we will learn how to display numerical data using histograms and boxplots. Discrete data most frequently arise from counting. In these cases, each observation is a whole number; however, some discrete data are not comprised fully of whole numbers. In contrast, a continuous random variable can take on any value in one or more intervals on the number line. Because a discrete random variable may assume either a finite or countably infinite number of values and a continuous random variable can assume any of an uncountably infinite number of values, we sometimes have to present data arising from observing these two types of random variables differently.

## Tabular Displays of Discrete Distributions

Categorical data can be organized in tabular form. For each category, the frequency and/or relative frequency were presented.We could not compute the cumulative relative frequency when working with categorical data because the categories had no natural ordering. If the number of possible values of a discrete random variable is finite, its population distribution can be displayed in a table,much like we did for the categorical random variable. For each possible value of the discrete random variable, the frequency or relative frequency is presented. Because for discrete numerical data the categories have an order (e.g., one is less than two is less than three), we may also want to record the cumulative relative frequencies in the table. The cumulative relative frequency of i is the number of observations, with the value of i or less divided by the total number of observations. If we have a sample from a discrete distribution and not the whole population, we can display the sample distribution in tabular form. For each observed value,we would have the frequency, relative frequency, and perhaps the cumulative relative frequency.

#### Example

A researcher named John L. Hoogland studied the mating behavior of Gunnison's prairie dogs at Petrified Forest National Park in Arizona for seven years, from 1989 to 1995. In 1998, he wrote an article titled "Why do female Gunnison's prairie dogs copulate with more than one male?" in Animal Behavior (55:351–359). Each year, all adult and juvenile prairie dogs at the 14- hectare study site were captured and marked. Mating season begins in mid-March and ends in early April. However, a female usually accepts partners on only one day of the breeding season. The number of partners accepted by each female prairie dog during a breeding season was recorded. During this seven-year period, female prairie dogs number 87, 93, 61, 17, and 5 accepted one, two, three, four, and five partners, respectively.

2. Present the sample distribution in tabular form.

#### Solution

1. All female prairie dogs that could be observed were observed in the study area, so there was no random selection of prairie dogs from some larger population. The number of partners could not be assigned at random, so there was no random assignment of treatments. Therefore, this is an observational study.
2. The sample distribution is shown in Table 7.1.

From the table, we see that 87 of the 263 prairie dog females accepted one partner, but five prairie dog females accepted five partners. Further, 93 (or 35%) of the females accepted two partners. As in Lesson 3, we have a rounding error because the relative frequency column sums to 0.99 and not to 1.0. Again, we report the 1.0 and not 0.99. It is better to accurately report the relative frequencies than to force columns or rows in a table to total to an inaccurate value. From the cumulative relative frequency column, we see that 68% of the prairie dogs accepted one or two partners.

