Distribution
A distribution is a description of the set of possible values that a random variable can take. This can be done by noting the absolute or relative frequency. A distribution can be illustrated in terms of a table, or in terms of a graph.
Discrete Versus Continuous
Table 2-1 shows the results of a single, hypothetical experiment in which a die is tossed 6000 times. Figure 2-3 is a vertical bar graph showing the same data as Table 2-1. Both the table and the graph are distributions that describe the behavior of the die. If the experiment is repeated, the results will differ. If a huge number of experiments is carried out, assuming the die is not ''weighted,'' the relative frequency of each face (number) turning up will approach 1 in 6, or approximately 16.67%.
Table 2-2 shows the number of days during the course of a 365-day year in which measurable precipitation occurs within the city limits of five different hypothetical towns. Figure 2-4 is a horizontal bar graph showing the same data as Table 2-2. Again, both the table and the graph are distributions. If the same experiment were carried out for several years in a row, the results would differ from year to year. Over a period of many years, the relative frequencies would converge towards certain values, although long-term climate change might have effects not predictable or knowable in our lifetimes.
Both of the preceding examples involve discrete variables. When a distribution is shown for a continuous variable, a graph must be used. Figure 2-5 is a distribution that denotes the relative amount of energy available from sunlight, per day during the course of a calendar year, at a hypothetical city in the northern hemisphere.
Frequency Distribution
In both of the above examples (the first showing the results of 6000 die tosses and the second showing the days with precipitation in five hypothetical towns), the scenarios are portrayed with frequency as the dependent variable. This is true of the tables as well as the graphs. Whenever frequency is portrayed as the dependent variable in a distribution, that distribution is called a frequency distribution.
Suppose we complicate the situation involving dice. Instead of one person tossing one die 6000 times, we have five people tossing five different dice, and each person tosses the same die 6000 times. The dice are colored red, orange, yellow, green, and blue, and are manufactured by five different companies, called Corp. A, Corp. B, Corp. C, Corp. D, and Corp. E, respectively. Four of the die are ''weighted'' and one is not. There are thus 30,000 die tosses to tabulate or graph in total. When we conduct this experiment, we can tabulate the data in at least two ways.
Ungrouped frequency distribution
The simplest way to tabulate the die toss results as a frequency distribution is to combine all the tosses and show the total frequency for each die face 1 through 6. A hypothetical example of this result, called an ungrouped frequency distribution, is shown in Table 2-3. We don't care about the weighting characteristics of each individual die, but only about potential biasing of the entire set. It appears that, for this particular set of die, there is some bias in favor of faces 4 and 6, some bias against faces 1 and 3, and little or no bias either for or against faces 2 and 5.
Grouped frequency distribution
If we want to be more detailed, we can tabulate the frequency for each die face 1 through 6 separately for each die. A hypothetical product of this effort, called a grouped frequency distribution, is shown in Table 2-4.
The results are grouped according to manufacturer and die color. From this distribution, it is apparent that some of the die are heavily ''weighted.'' Only the green die, manufactured by Corp. D, seems to lack any bias. If you are astute, you will notice (or at least strongly suspect) that the green die here is the same die, with results gathered from the same experiment, as is portrayed in Table 2-1 and Fig. 2-3.
Distributions Practice Problems
Practice 1
Suppose you add up all the numbers in each column of Table 2-4. What should you expect, and why? What should you expect if the experiment is repeated many times?
Solution 1
Each column should add up to 6000. This is the number of times each die (red, orange, yellow, green, or blue) is tossed in the experiment. If the sum of the numbers in any of the columns is not equal to 6000, then the experiment was done in a faulty way, or else there is an error in the compilation of the table. If the experiment is repeated many times, the sums of the numbers in each column should always be 6000.
Practice 2
Suppose you add up all the numbers in each row of Table 2-4. What should you expect, and why? What should you expect if the experiment is repeated many times?
Solution 2
The sums of the numbers in the rows will vary, depending on the bias of the set of die considered as a whole. If, taken all together, the die show any bias, and if the experiment is repeated many times, the sums of the numbers should be consistently lower for some rows than for other rows.
Practice 3
Each small rectangle in a table, representing the intersection of one row with one column, is called a cell of the table. What do the individual numbers in the cells of Table 2-4 represent?
Solution 3
The individual numbers are absolute frequencies. They represent the actual number of times a particular face of a particular die came up during the course of the experiment.
Practice problems for these concepts can be found at: Learning the Statistics Jargon Practice Test
View Full Article
From Statistics Demystified: A Self-Teaching Guide. Copyright © 2004 by The McGraw-Hill Companies. All Rights Reserved.