The normal distribution is a widely used statistical tool. The normal distribution is often referred to as a bell curve, given its bell-type shape and perfect symmetry. The normal curve represents a distribution of individuals and generally indicates that most individuals are typical or normal on a particular measurement, but some individuals differ from that norm; as one moves further and further away from the center of the normal curve, individuals tend to exhibit characteristics that are more atypical of the norm. Normal distributions describe many phenomena. A normal distribution also is a fundamental mathematical assumption of many commonly used statistical techniques. The normal distribution can be used for many purposes, including calculating probabilities, examining students' performance on tests relative to the performance of other students, determining when certain students' scores are unusual or highly atypical, and deriving common metrics to compare students' scores across a variety of different assessments. The normal curve is the basis of many commonly used educational measures, including the SAT, the GRE, and many commonly used intelligence tests.
Many of the variables that are studied by educational researchers are assumed to come from a population of scores that are distributed normally. The normal distribution first was discussed by de Moivre in the 1700s; however, the normal distribution did not begin to be used as a statistical tool until mathematicians such as Pierre-Simon Laplace (1749–1827) and Carl Friedrich Gauss (1777–1855) began to study the distribution (Walker, 1934). The basic idea underlying the normal distribution is that as one examines large samples of individuals, the distribution of those individuals on many characteristics will often (but certainly not always) approximate a normal distribution. This is because most individuals will be typical or in the middle on the curve, whereas there will be fewer individuals who will be either extremely low or high on any given measure.
To start with a concrete example, image a distribution of SAT scores for first year students at a particular university. Most students will receive scores that are near the mean (and the median, and the mode), which would be displayed in the middle of a normal distribution. However, there are a few students who score extremely high (e.g., a perfect score of 800), and a few students who score extremely low on the SAT; those students' scores would be expressed at the extreme ends or tails of the normal distribution. This same type of curve would be evident for many other samples (e.g., the weight of bumblebees, the height of elephants, the number of barks emitted per day by pet dogs, IQ scores of third graders, etc.). In all of these cases, most measures will hover near a mean score, some will differ slightly from the mean, and there will always be a few cases that will be extremely low or high.
Although the normal distribution is often what people expect to see, many distributions turn out not to look exactly like a perfect bell curve. There are a number of criteria that can be used to assess the normality of a curve. The criteria often provide information regarding how much a particular distribution differs from a normal distribution.
First, distributions can be skewed. A distribution is skewed when the mean of a sample differs from the median. When the mean exceeds the median, a distribution will be positively skewed, or skewed to the right; when the mean is less than the median, the distribution will be negatively skewed, or skewed to the left. A skewed distribution is one that contains extreme scores at either end of the distribution; this causes one tail of the curve to look as if it is stretched outward. For example, if a researcher examined a distribution of family income in a particular neighborhood where a billionaire lived, then the distribution would be positively skewed, because this one extremely high income would affect the shape of the distribution by literally pulling the distribution to the right by affecting the mean. Statisticians often indicate the level of skewness with a numerical value; a curve with a skewness of zero is a perfect normal distribution; in contrast, a curve with a skewness greater than or equal to ± 2.0 is quite highly skewed. A positive skewness value indicates that the distribution is skewed to the right (i.e., the mean is larger than the median), whereas a negative skewness value indicates that the distribution is skewed to the left (i.e., the mean is smaller than the median).
Second, distributions also can be described in terms of their kurtosis. The term kurtosis refers to how flat or peaked a curve is. In terms of kurtosis, the normal curve is mesokurtic (i.e., neither too peaked or flat). In contrast, distributions that are rather flat and have high standard deviations are referred to as platykurtic distribution, whereas distributions with a thin, tall center and a low standard deviation are referred to as leptokurtic distributions. A platykurtic distribution might occur when there is a large amount of variability in a measure (e.g., scores on an achievement test in a particular school vary greatly, from some very low scores to some average scores to some very high scores); in contrast, a leptokurtic distribution might occur when there is little variability in a measure (e.g., scores on an achievement test for most students in the school were all very close to each other, with little variability).
The normal curve is generally measured in standard deviation units, most commonly referred to as z-scores. The normal distribution has a mean z-score of zero and a standard deviation of 1.0. A z-score for any individual can be determined by subtracting the mean score of the distribution from the individual's specific score and then dividing by the standard deviation of the distribution.
One of the most interesting mathematical features of the normal curve is that the percentage of scores that fall within certain areas under the curve are consistent across all normal distributions. Mathematically, exactly 34.13% of the area under the normal curve falls between the middle of the curve (which is also the mean, median, and mode), and one standard deviation above the mean (or one z-score above the mean); similarly, exactly 34.13% of the area under the curve also falls between the middle of the curve and one standard deviation below the mean. The percentage of area under the curve decreases as the number of standard deviation units one moves away from the mean increases. Thus an additional 13.59% of the area under the curve falls between both + 1 and + 2 standard deviations above the mean, and -1 and -2 standard deviations below the mean; an additional 2.15% of the area under the normal curve falls between =b 2 and ± 3 standard deviations above and below the mean (Sprinthall, 1997). The percentages of the area under the curve continue to remain identical on both the left and right-hand sides of the normal distribution as one proceeds up through higher standard deviations (z-scores). The exact percentage of area under the normal curve between any z-score and the mean can be determined by using a z-score table, which can be found in most introductory statistics text books.
Normal curves are also related to percentile scores, which are commonly used by many educational practitioners. This is particularly important, because educators often need to report and interpret individual students' test scores for parents. A percentile is the point in a distribution below which a certain percentage of scores fall; thus a student who scored in the 60th percentile on an examination scored higher than 60% of the students who took that examination (this does not mean that the student correctly answered 60% of the questions on the examination). On a normal distribution, the 50th per-centile corresponds with the middle of the curve, or a z-score of zero. When z-scores are positive, percentile measures will be above 50, whereas when z-scores are negative, percentile measures will be below 50.
Some other scores that are typically used in the field of education also can be expressed in terms of normal distributions. One of these is T scores, which have a mean of 50 and a standard deviation of 10; another is a stanine score, which divides the normal distribution into nine distinct units. Thus a mean T score (50) corresponds to the 50th percentile, to a z-score of zero, and to a stanine score of 5.
See also:Standardized Testing
De Moivre, A. D. (2000). The Doctrine of Chances: A Method of Calculating the Probabilities of Events in Play. New York: Chelsea. (Originally published in 1716)
Sprinthall, R. C. (1997). Basic statistical analysis (5th ed.). Needham Heights, MA: Allyn & Bacon.
Walker, H. M. (1934). Bi-Centenary of the Normal Curve. Journal of the American Statistical Association, 29, 72–75.