Evaluating Results: Statistics, Probability, and Proof (page 2)
When you do an experiment or a survey comparing two sets of people or things, the job of statistics is to show whether you have a significant difference between the two sets. A small difference between the averages or means mayor may not be significant. How do you decide?
The experts use a system that has two basic stages:
- They examine the data to find out how much variation there already is among the specimens.
- They use that variation as a basis for deciding that the experimental difference, or survey difference, is enough to be a significant difference.
Another useful statistic is the median. In the test of reaction times by the ruler-drop method, we asked each partner to measure five catches by the other partner, then took the average, or mean. We might instead have chosen the number in the middle, which is called the median. It can often be as useful as the mean and takes less time to calculate.
Statistically Meaningful Results
The example of taking five measurements across a room in the measurement chapter may be thought of as a simple method among all the possible measurements that could be made. We could also set up a program of making many such measurements, or of many people each making many measurements, so that one might eventually have thousands or millions of measurements.
The large, unknown number of measurements of which anyone measurement is considered a sample can be known only from the sample. This is like eating cookies from a cookie jar. No matter how enjoyable the first, second, or third, we will never know how good the remaining cookies are from the samples only. We can only predict or infer that the uneaten cookies, the population from which the sample came, are like the sample.
How do scientists judge whether their sample of measurements (or findings expressed other ways) fairly represents all possible measurements? Let's say that Alice is doing an experiment as her science project in which she has planted popcorn seeds in two planters to test the value of a fertilizer. She is going to compare the two plantings, one with the fertilizer and the other without but otherwise grown under uniform conditions. To keep the numbers small for quick, easy measuring, let's say that each planter has five healthy, growing plants. Suppose that Alice measures the heights of the five plants in one of the planters and finds the following:
- Plant A 57.2 cm
- Plant B 57.2 cm
- Plant C 57.2 cm
- Plant D 57.2 cm
- Plant E 57.2 cm
What? All the same? Most of us who have had experience with growing things would immediately say that this is highly improbable, that it is just a coincidence that all the plants would be precisely the same height. Correct! It is a matter of chance or probability. Probability, you will find, is the main theme in the evaluation of scientific findings. Suppose, now, that Alice's measuring had brought the following results:
- Plant A 57.9 cm
- Plant B 55.7 cm
- Plant C 58.4 cm
- Plant D 59.2 cm
- Plant E 57.3 cm
"That's more like it," we would say. We expect differences in things, especially in living, growing things. That is, it is highly probable that the heights would not be all the same.
Now, whether we like the sample or not, it is all we know about the larger population of plants that Alice's supply of seed might grow. Suppose, again, that Ken planted 100 seeds from the same supply as Alice's and under very much the same conditions. Then suppose he went to work measuring them at the same stage as Alice's plants. We would like to see how the sizes vary in this much larger sample, so we make a frequency distribution (see figure l3.1) showing the sizes. That is, an "X" mark is made for each corn plant over its height measurement, which is listed along the bottom of the chart.
We see that there are not many of the shortest and tallest plants but more of each size in the middle of the range. If we drew a line over the tops of the columns of sizes (and if we had many more specimens measured and recorded) the lines, or line graphs, would look something like the one in figure 13.2.
Such a distribution of a large number of things (and it must be large, preferably in the thousands) is called a normal distribution. Many things show normal distribution when they are measured and graphed like this, for example, the heights of large numbers of people picked at random and the amounts of food eaten per person per year. This widespread nature of things to show normal distribution has been used by scientists and statisticians to work out ever more meaningful designs for science investigations. Most modern scientists are thinking about the statistics they will use to analyze their findings from the beginning or planning stages of their investigations. They are saying something like this: "I don't want my experiment to come out as some queer, quirky thing that proves nothing. How must I plan now so that in the end my results will be statistically meaningful?" Scientists know, however, that there can be no perfect answer to their questions. They can always, just by chance, get results that show unexpected quirks.
Nevertheless, as a scientist does her investigation, she is trying to uncover some meaningful results. This means more than just saying, "Yes" or "No" to the hypothesis. It means going beyond the small number of subjects she may be dealing with in her experiment or survey. It means having confidence that her findings may be stretched, or generalized, to any larger group of similar subjects. Did ingredient Q seem to prevent sunburn in the experimental group of people who used it? If so, and if that experimental group fairly represents the larger popu1ation, we may then reasonably expect that ingredient Q will prevent sunburn in most of the larger population.
The use of random choices in the first stages of an investigation means more than just helping to keep the scientist's prejudices from affecting the results. It helps to assure that the sample of people, or other subjects, used in the investigation will allow us to generalize to the larger group that the sample is intended to represent.
Can You Prove It?
Let's say that Alice is doing an experiment as her science project in which she has planted popcorn seeds in two planters to test the value of a fertilizer. She uses the controlled experiment design.
To the experimental group she adds a chemical fertilizer, urea, a nitrogen compound that may be put into the soil or dissolved in the water given the plants. Her independent variable is the addition of the urea to the experimental group. Her dependent variable, if she observes one, is the difference in growth rate (height or weight) of the plants in her two planters.
At a proper time in her experiment, she measures the heights of the plants with the following results:
We see that there is a difference between the means (commonly called average) of the two groups. The difference is 1.9 cm in favor of the experimental group; the average height of the plants in that group is 1.9 cm taller than the height of the plants in the control group. This looks good.
"See!" Alice says. "Adding urea to the experimental planting has made the corn grow faster." Can she be sure of this? No, she cannot. Maybe it was a chance happening that she got five taller growing plants in the experimental group and five shorter growing plants in the control. She should not make any decision just yet. She should get someone to make a good statistical treatment (unless she can do it herself) that would go beyond comparing the mean heights of the two groups.
A statistical analysis would show how much the heights vary among themselves. Then it would show how the means compare with a larger "population" of plants like Ken's 100 plants. Where would this larger population be found? It would be imagined, inferred, or hypothetical: it would be created out of the variability, the range, the scatter of her sample and the size of the sample. It would be created by the use of equations in statistics books.
Furthermore, a judgment would be made about the chance, or the probability, that the difference Alice found was or was not simply a chance difference. This, too, would be done by reference to appropriate tables in statistics books. Actually, the number of plants in Alice's experiment is too small (only five) to make it worth all of that analysis, yet her results are supported by agricultural research by professional scientists and by the experiences of the thousands of farmers who have found it useful to apply urea and other nitrogen compounds to their corn plantings.
With all of that support, why wouldn't scientists declare that they have proven the value of this treatment of corn? The problem lies partly in this question: How can you know when you have proven a thing to be true? And it lies partly in the way the words "prove" and "true" are used in mathematics and logic as compared to the way they are used in ordinary speech.
First, the mathematics and logic. You and I can agree that this is a true statement in arithmetic: 148 + 293 + 167 = 608. That is, we follow certain rules of mathematics to prove whether the statement is an equality. Mathematicians would not agree, however, that we had proven it by following the rules of addition. They are more concerned about the sources of those rules. In the end, they would show that the statement was proven by agreeing on certain things about arithmetic and its rules.
In logic of the formal sort, proof would be much the same, as in this example:
- If all wangtups have gitly speekrongs,
- And if Q is a wangtup,
- Then Q has gitly speekrongs.
Even though the statements do not mean anything in real life, if we accept the first and second statements as true, then the conclusion, the third statement, is also true. The "proof" is all right there in the statement. It has nothing to do with real people or things and their mixed-up ways.
Still, these simple examples do not do justice to mathematics and logic. Both are fascinating and powerful tools of thought or reasoning that humankind has created. The proof or truth of these examples, however, is so very much different from the kinds of proof that scientists are seeking that it becomes awkward to try to use the same language to describe them all. Even though mathematicians and logicians got there first with the terms "prove" and "true," scientists in recent times have pulled away from using these terms.
In ordinary experience as well there is a problem with these key words. Most people would say, "See, Alice proved it! It is true that urea makes corn grow faster." Or they might say, "That proves it! Hocus is better for a headache than Pocus," even though they may have used the medication only one time and their test has serious weaknesses. Or, again: "That proves it! Dreams do foretell the future. I knew that you were coming because I dreamed about it!"
These difficulties with the language, however, do not provide the main objection to the use of "prove" in scientific work. When we talk about "proving" something in science we are, in effect, predicting the future as well as examining the present. How much can we depend on something happening in the future just because today's scientific findings show it to be probable now?
In Alice's experiment, for example, she used only five plants in each planter. Such a small sample cannot tell us much about the larger population of future corn plantings, no matter how much statistical analysis we apply to it. However, let's do some more analysis of Alice's results to see how this helps us to learn about the predictive value of her findings. Let's rearrange the measurements of the corn plants according to height (see table 13.2).
Does this tell us more than a simple comparison of the means? Suppose her results in the experimental group had been as in table 13.3 (also ranked by height).
Here we see that the difference between the means of the two groups is the same as in table 13.2. But notice the range of heights in table 13.3. The experimental plants are not as uniformly taller than the control plants as they were in table 13.2. There is more variability. These results would provide a less reliable basis for predicting about future plantings.
I hope that you begin to agree, if you had not already known, that statistical treatment of data can reveal useful information. Finding the means and their difference is statistical analysis. Ranking the heights and comparing the pairs of plants is statistical analysis. These two ways of analyzing data are very elementary (even antiquated) when compared with the methods used by people with more mathematical and statistical knowledge.
Add your own comment
- Kindergarten Sight Words List
- The Five Warning Signs of Asperger's Syndrome
- What Makes a School Effective?
- Child Development Theories
- 10 Fun Activities for Children with Autism
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- Should Your Child Be Held Back a Grade? Know Your Rights
- Bullying in Schools
- First Grade Sight Words List
- Test Problems: Seven Reasons Why Standardized Tests Are Not Working