Education.com

# Evaluating Results: Statistics, Probability, and Proof

By Thomas Moorman
John Wiley & Sons, Inc.
Updated on Jan 1, 2011

When you do an experiment or a survey comparing two sets of people or things, the job of statistics is to show whether you have a significant difference between the two sets. A small difference between the averages or means mayor may not be significant. How do you decide?

The experts use a system that has two basic stages:

1. They examine the data to find out how much variation there already is among the specimens.
2. They use that variation as a basis for deciding that the experimental difference, or survey difference, is enough to be a significant difference.

Another useful statistic is the median. In the test of reaction times by the ruler-drop method, we asked each partner to measure five catches by the other partner, then took the average, or mean. We might instead have chosen the number in the middle, which is called the median. It can often be as useful as the mean and takes less time to calculate.

### Statistically Meaningful Results

The example of taking five measurements across a room in the measurement chapter may be thought of as a simple method among all the possible measurements that could be made. We could also set up a program of making many such measurements, or of many people each making many measurements, so that one might eventually have thousands or millions of measurements.

The large, unknown number of measurements of which anyone measurement is considered a sample can be known only from the sample. This is like eating cookies from a cookie jar. No matter how enjoyable the first, second, or third, we will never know how good the remaining cookies are from the samples only. We can only predict or infer that the uneaten cookies, the population from which the sample came, are like the sample.

How do scientists judge whether their sample of measurements (or findings expressed other ways) fairly represents all possible measurements? Let's say that Alice is doing an experiment as her science project in which she has planted popcorn seeds in two planters to test the value of a fertilizer. She is going to compare the two plantings, one with the fertilizer and the other without but otherwise grown under uniform conditions. To keep the numbers small for quick, easy measuring, let's say that each planter has five healthy, growing plants. Suppose that Alice measures the heights of the five plants in one of the planters and finds the following:

Plant A     57.2 cm
Plant B     57.2 cm
Plant C     57.2 cm
Plant D     57.2 cm
Plant E     57.2 cm

What? All the same? Most of us who have had experience with growing things would immediately say that this is highly improbable, that it is just a coincidence that all the plants would be precisely the same height. Correct! It is a matter of chance or probability. Probability, you will find, is the main theme in the evaluation of scientific findings. Suppose, now, that Alice's measuring had brought the following results:

Plant A     57.9 cm
Plant B     55.7 cm
Plant C     58.4 cm
Plant D     59.2 cm
Plant E     57.3 cm

"That's more like it," we would say. We expect differences in things, especially in living, growing things. That is, it is highly probable that the heights would not be all the same.

Now, whether we like the sample or not, it is all we know about the larger population of plants that Alice's supply of seed might grow. Suppose, again, that Ken planted 100 seeds from the same supply as Alice's and under very much the same conditions. Then suppose he went to work measuring them at the same stage as Alice's plants. We would like to see how the sizes vary in this much larger sample, so we make a frequency distribution (see figure l3.1) showing the sizes. That is, an "X" mark is made for each corn plant over its height measurement, which is listed along the bottom of the chart.

We see that there are not many of the shortest and tallest plants but more of each size in the middle of the range. If we drew a line over the tops of the columns of sizes (and if we had many more specimens measured and recorded) the lines, or line graphs, would look something like the one in figure 13.2.

Such a distribution of a large number of things (and it must be large, preferably in the thousands) is called a normal distribution. Many things show normal distribution when they are measured and graphed like this, for example, the heights of large numbers of people picked at random and the amounts of food eaten per person per year. This widespread nature of things to show normal distribution has been used by scientists and statisticians to work out ever more meaningful designs for science investigations. Most modern scientists are thinking about the statistics they will use to analyze their findings from the beginning or planning stages of their investigations. They are saying something like this: "I don't want my experiment to come out as some queer, quirky thing that proves nothing. How must I plan now so that in the end my results will be statistically meaningful?" Scientists know, however, that there can be no perfect answer to their questions. They can always, just by chance, get results that show unexpected quirks.

Nevertheless, as a scientist does her investigation, she is trying to uncover some meaningful results. This means more than just saying, "Yes" or "No" to the hypothesis. It means going beyond the small number of subjects she may be dealing with in her experiment or survey. It means having confidence that her findings may be stretched, or generalized, to any larger group of similar subjects. Did ingredient Q seem to prevent sunburn in the experimental group of people who used it? If so, and if that experimental group fairly represents the larger popu1ation, we may then reasonably expect that ingredient Q will prevent sunburn in most of the larger population.

The use of random choices in the first stages of an investigation means more than just helping to keep the scientist's prejudices from affecting the results. It helps to assure that the sample of people, or other subjects, used in the investigation will allow us to generalize to the larger group that the sample is intended to represent.

150 Characters allowed

### Related Questions

#### Q:

See More Questions

### Today on Education.com

#### SUMMER LEARNING

June Workbooks Are Here!

#### EXERCISE

Get Active! 9 Games to Keep Kids Moving

#### TECHNOLOGY

Are Cell Phones Dangerous for Kids?