Practice problems for these concepts can be found at:

In the last example of the previous section, we said that the graph appeared to be *centered* about a height of 66" In this section, we talk about ways to describe the *center* of a distribution. There are two primary measures of center: the **mean** and the **median**. There is a third measure, the **mode**, but it tells where the most frequent values occur for inch more than it describes the center. In some distributions, the mean, median, and mode will be close in value, but the mode can appear at any point in the distribution.

### Mean

Let *x*_{i} represent any value in a set of *n* values (*i* = 1, 2,…, *n*). The mean of the set is defined as the sum of the *x*'s divided by *n*. Symbolically . Usually, the indices on the summation symbol in the numerator are left out and the expression is simplified to .

**Σ** *x* means "the sum of *x*" and is defined as follows: **Σ** *x* = *x*_{1} + *x*_{2} +… +*x*_{n}. Think of it as the "add-'em-up" symbol to help remember what it means. is used for a mean based on a sample (a statistic). In the event that you have access to an entire distribution (such as in Chapters 9 and 10), its mean is symbolized by the Greek letter μ

(*Note*: in the previous chapter, we made a distinction between statistics, which are values that describe sample data, and *parameters*, which are values that describe populations. Unless we are clear that we have access to an entire population, or that we are discussing a distribution, we use the statistics rather than parameters.)

**example:** During his major league career, Babe Ruth hit the following number of home runs (1914–1935): 0, 4, 3, 2, 11, 29, 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22, 6. What was the *mean* number of home runs per year for his major league career?

### Median

The **median** of a ordered dataset is the "middle" value in the set. If the dataset has an odd number of values, the *median* is a member of the set and is the middle value. If there are 3 values, the median is the second value. If there are 5, it is the third, etc. If the dataset has an even number of values, the median is the mean of the two middle numbers. If there are 4 values, the median is the mean of the second and third values. In general, if there are n values in the ordered dataset, the *median* is at the position. If you have 28 terms in order, you will find the median at the = 14.5th position (that is, between the 14th and 15th terms). Be careful not to interpret as the value of the median rather than as the location of the median.

**example:** Consider once again the data in the previous example from Babe Ruth's career. What was the *median* number of home runs per year he hit during his major league career?

**solution:** First, put the numbers in order from smallest to largest: 0, 2, 3, 4, 6, 11, 22, 25, 29, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60. There are 22 scores, so the median is found at the 11.5th position, between the 11th and 12th scores (35 and 41). So the *median* is

.

The 1-Var Stats procedure, described in the previous Calculator Tip box, will, if you scroll down to the second screen of output, give you the median (as part of the entire five-number summary of the data: minimum, lower quartile; median, upper quartile; maximum).

### Resistant

Although the mean and median are both measures of center, the choice of which to use depends on the shape of the distribution. If the distribution is symmetric and mound shaped, the mean and median will be close. However, if the distribution has outliers or is strongly skewed, the median is probably the better choice to describe the center. This is because it is a **resistant statistic**, one whose numerical value is not dramatically affected by extreme values, while the mean is not resistant.

**example:** A group of five teachers in a small school have salaries of $32,700, $32,700, $38,500, $41,600, and $44,500. The mean and median salaries for these teachers are $38,160 and $38,500, respectively. Suppose the highest paid teacher gets sick, and the school superintendent volunteers to substitute for her. The superintendent's salary is $174,300. If you replace the $44,500 salary with the $174,300 one, the median doesn't change at all (it's still $38,500), but the new mean is $64,120—almost everybody is below average if, by "average," you mean *mean*. It's sort of like Lake Wobegon, where all of the children are expected to be above average.

**example:** For the graph given below, would you expect the mean or median to be larger? Why?

**solution:** You would expect the median to be larger than the mean. Because the graph is skewed to the left, and the mean is not resistant, you would expect the mean to be pulled to the left (in fact, the dataset from which this graph was drawn from has a mean of 5.4 and a median of 6, as expected, given the skewness).

Practice problems for these concepts can be found at: