Truncation
The process of truncation is a method of approximating numbers denoted as decimal expansions. It involves the deletion of all the numerals to the right of a certain point in the decimal part of an expression. Some electronic calculators use truncation to fit numbers within their displays. For example, the number 3.830175692803 can be shortened in steps as follows:

Rounding
Rounding is the preferred method of approximating numbers denoted as decimal expansions. In this process, when a given digit (call it r) is deleted at the right-hand extreme of an expression, the digit q to its left (which becomes the new r after the old r is deleted) is not changed if 0 ≤ r ≤ 4. If 5 ≤ r ≤ 9, then q is increased by 1 (''rounded up''). Most electronic calculators use rounding rather than truncation. If rounding is used, the number 3.830175692803 can be shortened in steps as follows:

Cumulative Absolute Frequency
When data are tabulated, the absolute frequencies are often shown in one or more columns. Look at Table 2-5, for example. This shows the results of the tosses of the blue die in the experiment we looked at a while ago. The first column shows the number on the die face. The second column shows the absolute frequency for each face, or the number of times each face turned up during the experiment. The third column shows the cumulative absolute frequency, which is the sum of all the absolute frequency values in table cells at or above the given position.
The cumulative absolute frequency numbers in a table always ascend (increase) as you go down the column. The total cumulative absolute frequency should be equal to the sum of all the individual absolute frequency numbers. In this instance, it is 6000, the number of times the blue die was tossed.

Cumulative Relative Frequency
Relative frequency values can be added up down the columns of a table, in exactly the same way as the absolute frequency values are added up. When this is done, the resulting values, usually expressed as percentages, show the cumulative relative frequency.
Examine Table 2-6. This is a more detailed analysis of what happened with the blue die in the above-mentioned experiment. The first, second, and fourth columns in Table 2-6 are identical with the first, second, and third columns in Table 2-5. The third column in Table 2-6 shows the percentage represented by each absolute frequency number. These percentages are obtained by dividing the number in the second column by 6000, the total number of tosses. The fifth column shows the cumulative relative frequency, which is the sum of all the relative frequency values in table cells at or above the given position.
The cumulative relative frequency percentages in a table, like the cumulative absolute frequency numbers, always ascend as you go down the column. The total cumulative relative frequency should be equal to 100%. In this sense, the cumulative relative frequency column in a table can serve as a checksum, helping to ensure that the entries have been tabulated correctly.

Mean
The mean for a discrete variable in a distribution is the mathematical average of all the values. If the variable is considered over the entire population, the average is called the population mean. If the variable is considered over a particular sample of a population, the average is called the sample mean. There can be only one population mean for a population, but there can be many different sample means. The mean is often denoted by the lowercase Greek letter mu, in italics (μ). Sometimes it is also denoted by an italicized lowercase English letter, usually x, with a bar (vinculum) over it.
Table 2-7 shows the results of a 10-question test, given to a class of 100 students. As you can see, every possible score is accounted for. There are some people who answered all 10 questions correctly; there are some who did not get a single answer right. In order to determine the mean score for the whole class on this test – that is, the population mean, called μp – we must add up the scores of each and every student, and then divide by 100. First, let's sum the products of the numbers in the first and second columns. This will give us 100 times the population mean:

(10 × 5) + (9 × 6) + (8 × 19) + (7 × 17) + (6 ×18) + (5 × 11) + (4 × 6) + (3 × 4) + (2 × 4) + (1 × 7) + (0 × 3)
= 50 + 54 + 152 + 119 + 108 + 55 + 24 + 12 + 8 + 7 + 0
Dividing this by 100, the total number of test scores (one for each student who turns in a paper), we obtain μp = 589/100 = 5.89.
The teacher in this class has assigned letter grades to each score. Students who scored 9 or 10 correct received grades of A; students who got scores of 7 or 8 received grades of B; those who got scores of 5 or 6 got grades of C; those who got scores of 3 or 4 got grades of D; those who got less than 3 correct answers received grades of F. The assignment of grades, informally known as the ''curve,'' is a matter of teacher temperament and doubtless would seem arbitrary to the students who took this test. (Some people might consider the ''curve'' in this case to be overly lenient, while a few might think it is too severe.)
Median
If the number of elements in a distribution is even, then the median is the value such that half the elements have values greater than or equal to it, and half the elements have values less than or equal to it. If the number of elements is odd, then the median is the value such that the number of elements having values greater than or equal to it is the same as the number of elements having values less than or equal to it. The word ''median'' is synonymous with ''middle.''
Table 2-8 shows the results of the 10-question test described above, but instead of showing letter grades in the third column, the cumulative absolute frequency is shown instead. The tally is begun with the top-scoring papers and proceeds in order downward. (It could just as well be done the other way, starting with the lowest-scoring papers and proceeding upward.) When the scores of all 100 individual papers are tallied this way, so they are in order, the scores of the 50th and 51st papers – the two in the middle – are found to be 6 correct. Thus, the median score is 6, because half the students scored 6 or above, and the other half scored 6 or below.
It's possible that in another group of 100 students taking this same test, the 50th paper would have a score of 6 while the 51st paper would have a score of 5. When two values ''compete,'' the median is equal to their average. In this case it would be midway between 5 and 6, or 5.5.

Mode
The mode for a discrete variable is the value that occurs the most often. In the test whose results are shown in Table 2-7, the most ''popular'' or often occurring score is 8 correct answers. There were 19 papers with this score. No other score had that many results. Therefore, the mode in this case is 8.

Suppose that another group of students took this test, and there were two scores that occurred equally often. For example, suppose 16 students got 8 answers right, and 16 students also got 6 answers right. In this case there are two modes: 6 and 8. This sort of distribution is called a bimodal distribution.
Now imagine there are only 99 students in a class, and there are exactly 9 students who get each of the 11 possible scores (from 0 to 10 correct answers). In this distribution, there is no mode. Or, we might say, the mode is not defined.
The mean, median, and mode are sometimes called measures of central tendency. This is because they indicate a sort of ''center of gravity'' for the values in a data set.
Variance
There is still another way in which the nature of a distribution can be described. This is a measure of the extent to which the values are spread out. There is something inherently different about a distribution of test scores like those portrayed in Table 2-7, compared with a distribution where every score is almost equally ''popular.'' The test results portrayed in Table 2-7 are also qualitatively different than a distribution where almost every student got the same score, say 7 answers correct.

In the scenario of Table 2-7, call the variable x, and let the 100 individual scores be called x1 through x100. Suppose we find the extent to which each individual score xi (where i is an integer between 1 and 100) differs from the mean score for the whole population (μp). This gives us 100 ''distances from the mean,'' d1 through d100, as follows:
The vertical lines on each side of an expression represent the absolute value. For any real number r, |r| = r if r ≥ 0, and |r| = –r if r < 0. The absolute value of a number is the extent to which it differs from 0. It avoids the occurrence of negative numbers as the result of a mathematical process.
Now, let's square each of these ''distances from the mean,'' getting this set of numbers:
The absolute-value signs are not needed in these expressions, because for any real number r, r2 is never negative.
Next, let's average all the ''squares of the distances from the mean,'' di2. This means we add them all up, and then divide by 100, the total number of scores, obtaining the ''average of the squares of the distances from the mean.'' This is called the variance of the variable x, written Var(x):
Var(x) = (1/100)(d12 + d22 + . . . + d1002) = (1/100)[(x1 – μp)2 + (x2 – μp)2 + . . . + (x100 – μp)2]
The variance of a set of n values whose population mean is μp is given by the following formula:
Var(x) = (1/n)[(x1 – μp)2 + (x2 – μp)2 + . . . + (xn – μp)2]
Standard Deviation
Standard deviation, like variance, is an expression of the extent to which values are spread out with respect to the mean. The standard deviation is the square root of the variance, and is symbolized by the italicized, lowercase Greek letter sigma (σ). (Conversely, the variance is equal to the square of the standard deviation, and is often symbolized σ2.) In the scenario of our test:
σ = [(1/100)(d12 + d22 + . . . + d1002)]1/2 = {(1/100)[(x1 – μp)2 + (x2 – μp)2 + . . . + (x100 – μp)2]}1/2
The standard deviation of a set of n values whose population mean is μp is given by the following formula:
σ = {(1/n)[(x1 – μp)2 + (x2 – μp)2 + . . . + (xn – μp)2]}1/2
These expressions are a little messy. It's easier for some people to remember these verbal definitions:
- Variance is the average of the squares of the ''distances'' of each value from the mean.
- Standard deviation is the square root of the variance.
Variance and standard deviation are sometimes called measures of dispersion. In this sense the term ''dispersion'' means ''spread-outedness.''
Statistics Definitions Practice Problems
Practice 1
What are the sample means for each grade in the test whose results are tabulated in Table 2-7? Use rounding to determine the answers to two decimal places.

Solution 1
Let's call the sample means μsa for the grade of A, μsb for the grade of B, and so on down to μsf for the grade of F.
To calculate μsa, note that 5 students received scores of 10, while 6 students got scores of 9, both scores good enough for an A. This is a total of 5 + 6, or 11, students getting the grade of A. Therefore:
μsa = [(5 × 10) + (6 × 9)]/11
To find μsb, observe that 19 students scored 8, and 17 students scored 7. Thus, 19 + 17, or 36, students received grades of B. Calculating:
μsb = [(19 × 8) + (17 × 7)]/36
To determine μsc, check the table to see that 18 students scored 6, while 11 students scored 5. Therefore, 18 + 11, or 29, students did well enough for a C. Grinding out the numbers yields this result:
μsc = [(18 × 6) + (11 × 5)]/29
To calculate μsd, note that 6 students scored 4, while 4 students scored 3. This means that 6 + 4, or 10, students got grades of D:
μsd = [(6 × 4) + (4 × 3)]/10
Finally, we determine μsf. Observe that 4 students got scores of 2, 7 students got scores of 1, and 3 students got scores of 0. Thus, 4 + 7 + 3, or 14, students failed the test:
μsf = [(4 × 2) + (7 × 1) + (3 × 0)]/14
Practice 2
Draw a vertical bar graph showing all the absolute-frequency data from Table 2-5, the results of a ''weighted'' die-tossing experiment. Portray each die face on the horizontal axis. Let light gray vertical bars show the absolute frequency numbers, and let dark gray vertical bars show the cumulative absolute frequency numbers.

Solution 2
Figure 2-6 shows such a graph. The numerical data is not listed at the tops of the bars in order to avoid excessive clutter.

Fig. 2-6. Illustration for Practice 2.
Practice 3
Draw a horizontal bar graph showing all the relative-frequency data from Table 2-6, another portrayal of the results of a ''weighted'' die-tossing experiment. Show each die face on the vertical axis. Let light gray horizontal bars show the relative frequency percentages, and dark gray horizontal bars show the cumulative relative frequency percentages.

Solution 3
Figure 2-7 is an example of such a graph. Again, the numerical data is not listed at the ends of the bars, in the interest of neatness.

Fig. 2-7. Illustration for Practice 3.
Practice 4
Draw a point-to-point graph showing the absolute frequencies of the 10-question test described by Table 2-7. Mark the population mean, the median, and the mode with distinctive vertical lines, and label them.

Solution 4
Figure 2-8 is an example of such a graph. Numerical data is included for the population mean, median, and mode.

Fig. 2-8. Illustration for Practice 4.
Practice 5
Calculate the variance, Var(x), for the 100 test scores tabulated in Table 2-7.

Solution 5
Recall that the population mean, μp, as determined above, is 5.89. Table 2-9 shows the ''distances'' of each score from μp, the squares of these ''distances,'' and the products of each of these squares with the absolute frequencies (the number of papers having each score from 0 to 10). At the bottom of the table, these products are all summed. The resulting number, 643.58, is 100 times the variance. Therefore:
Table 2-9 "Distances" of each test score xi from the population mean μp,the squares of these "distances," the products of the squares with the absolute frequencies fi, and the sum of these products. This information is used in Solution 5.

Practice 6
Calculate the standard deviation for the 100 test scores tabulated in Table 2-7.

Solution 6
The standard deviation, σ, is the square root of the variance. Approximating:
It is reasonable to round this off to 2.54.
Practice problems for these concepts can be found at: Learning the Statistics Jargon Quiz
Add your own comment