Introduction to Measures of Central Tendency for Numerical Data
We need a greater statistical vocabulary if we are to describe the distributions of dotplots and stem-and-leaf plots. In this lesson, we will begin to think about measures of the middle of the distribution.
Population Measures of Central Tendency
Measures of central tendency attempt to quantify the middle of the distribution. If we are working with the population, these measures are parameters. If we have a sample, the measures are statistics, which are estimates of the population parameters. There are many ways to measure the center of a distribution, and we will learn about the three most common: mean, median, and mode.
Mean
The mean, denoted by μ, is the most common measure of central tendency. The mean is the average of all population values. If a population has N members, the mean is
,
where xi is the value of the variable associated with the ith unit in the population. We have used two different notations to symbolize the mean. The first is the Greek letter μ, which has become a conventional representation of the mean. The other is E(X), representing the "expected value of X." The average of a random variable across the whole population is the mean or the expected value of that variable. Notice that the symbol for a capital sigma (∑), is a shorthand way of saying to add a set of numbers. The terms to be added are those beginning with i = 1 (because "i = 1" is below the sigma) to i = N (because "N" is at the top of the sigma). The index, here i, is incremented by one in each subsequent term.
Example
Find the mean height of the 62 high school orchestra members given in Dotplots and Stem-and-Leaf Plots: Study Guide.
Solution
The mean height for this population is
= 66.4.
Although we have presented a formal equation for the mean, it is important to remember that the population mean is simply the average of all population values.
Median
The median, another measure of central tendency, is the middle value of the population distribution. To find the median, order all of the values in the population from largest to smallest and find the middle value. For example, suppose the following five values constituted the population distribution:
The middle value is the 7. Now suppose the following four values represent the population distribution:
Here, the middle value is somewhere between the 6 and the 9. The median is any value between 6 and 9; however, usually, the average of the two values,
= 7.5, is taken as the median value. Through these two illustrations, we can see that we find the median in a slightly different manner when there is an even number of observations in the distribution than when there is an odd number of observations. This can be written generally as follows.
If N, the number of values in the population, is odd, the median is the
st value in the ordered list of population values. If N is even, the median is any value between the
th and the
st values in the ordered list of population values; usually, the average of the two values is taken as the median. Note that this definition of population median is appropriate only if the number of population units is finite. For a continuous random variable, the median is still the middle value in the population, but we must use other methods to define it.
Median Note
If N, the number of values in the population, is odd, the median is the
st value in the ordered list of population values. If N is even, the median is any value between the
th and the
st values in the ordered list of population values; usually, the average of the two values is taken as the median.
Example
Find the median height of the 62 high school orchestra members. Compare the mean and median.
Solution
Because an even number of orchestra members exists, the median height is the average of the 31st and 32nd values in the ordered list of orchestra members' heights. The stem-and-leaf plot makes these values easy to find. Simply start at the top of the plot and count the number of leaves, always working from the stem out. Continue until the 31st and 32nd values have been identified. The 31st value is 66.6 inches, and the 32nd value is 67.1 inches. Any value between 66.6 and 67.1 is a median value. However, we will follow tradition and average the two:
= 66.85.That is, the median orchestra member height is 66.85 inches.
Notice that in this case, the mean and median are close, but not identical. For the median, exactly half of the population values are less than 66.85 inches, and half are greater than 66.85 inches. For the mean, 29 of the values are less than the mean, and 33 are greater than the mean, which is still close to half of the values. Sometimes, the mean and median are much further apart. We will consider what differences in the mean and median indicate about the distribution in the next lesson.
Mode
The mode is the most frequently occurring value in the population and is another measure of central tendency. This measure tends to be most useful for discrete random variables. For the orchestra members' heights, three members have heights of 59.7 inches. Similarly, we have three other groups, each with three members, with heights of 68.7, 68.9, and 70.5 inches. Thus, there are four modes.
Which measure of central tendency is the best? Each provides a little different information. The mean is the most common measure, but it is influenced by extreme values. One extreme value can have a big impact on the mean, especially if the population does not have many members. An unusually small population value may cause the mean to be quite a bit smaller than it would have been if that value was not in the population. Similarly, an unusually large population value may tend to inflate the mean. In contrast, the median and mode are not affected by these unusual values. The mode is often not useful because it may not be unique.
Sample Measures of Central Tendency
The population mean, median, and mode are parameters. To find their values, we must know all of the population values. Unless the population is small, as was the case when our population of interest was the orchestra members at one specific high school, this rarely happens. In practice, we cannot usually find the population mean, median, or mode. We estimate these parameters by finding the sample mean, sample median, and sample mode, respectively. The mean, median, and mode of the sample values are the sample mean, sample median, and sample mode. These statistics are each an estimator of their population counterpart. That is, the sample mean is an estimate of the population mean; the sample median is an estimate of the population median; and the sample mode is an estimate of the population mode. We will learn more about the characteristics of these estimates and their uses later. For now, let's concentrate on how to find them.
Example
Find the sample mean, sample median, and sample mode for the difference in the number of blinks while playing video games and the number of blinks during normal conversation. Which one do you think is the best measure of central tendency for these data?
Solution
The sample mean is the average of the sample values, that is, the average of the sample differences:

To find the median, we first order the sample values from smallest to largest:
13 13 17 17 18 19
21 24 26 26 27 29
32 34 40
Because there are 15 values in the sample, the middle value is the
= 8th value in the list; that is, the sample median is 24.
The sample mode is the most frequently occurring value in the sample. Here, the values of 13, 17, and 26 were each observed twice. In these cases, the sample mode is of little value in measuring central tendency.
The mean and median provide good measures of the center of distribution, but the mode does not.
Measures of Central Tendency for Numerical Data In Short
Mean, median, and mode are three measures of central tendency. Each provides a measure of the middle value in the population. The mean is the average of all population values. The median is the middle of the population values. The mode is the most frequently occurring population values. The sample mean, sample median, and sample mode are sample estimates of the corresponding population parameters.
Find practice problems and solutions for these concepts at Measures of Central Tendency for Numerical Data Practice Exercises.
Add your own comment