The Normal Distribution and Genetics Help

By — McGraw-Hill Professional
Updated on Aug 23, 2011

Normal Distribution

The study of a quantitative trait in a large population usually reveals that very few individuals possess the extreme phenotypes and that progressively more individuals are found nearer the average value for that population. This type of symmetrical distribution is characteristically bell-shaped, as shown in Fig. 8-2, and is called a normal distribution. It is approximated by the binomial distribution (p+q)n when the power of the binomial is very large and p and q are both 1/n or greater; p and q represent the probabilities of alternative independent events, p + q = 1.

The Normal Distribution

Average Measurements

The average phenotypic value for a normally distributed trait is expressed as the arithmetic mean (read "X bar"). The arithmetic mean is the sum of the individual measurements (Σ x) divided by the number of individuals measured (N). The Greek letter "sigma" (Σ) directs the statistician to sum what follows.

It is usually not feasible to measure every individual in a population; therefore, measurements are usually made on a sample from that population in order to estimate the population value (parameter). If the sample is truly representative of the larger population of which it is a part, then the arithmetic mean will be an accurate estimate of the mean of the entire population (μ). Note that letters from the English alphabet are used to represent statistics, i.e., measurements derived from a sample, whereas Greek letters are used to represent parameters, i.e., attributes of the population from which the sample was drawn. Parameters are seldom known and must be estimated from results gained by sampling. Obviously, the larger the sample size, the more accurately the statistic estimates the parameter.

Measurements of Variability

  1. Standard Deviation. Consider the three normally distributed populations shown in Fig. 8-3. Populations A and C have the same mean, but C is much more variable than A. Populations A and B have different means, but otherwise appear to have the same shape (dispersion). Therefore, in order to adequately define a normal distribution, we must know not only its mean but also how much variability exists. One of the most useful measures of variability in a population for genetic purposes is the standard deviation, symbolized by the lowercase Greek letter "sigma" (σ). A sample drawn from this population at random will have a sample standard deviation (s). To calculate s, the sample mean is subtracted (Xi) from each individual measurement (Xi) and the deviation is squared (Xi)2, summed over all individuals in the sample and divided by n–1, where n is the sample size. The calculation is completed by taking the square root of this value.
  2. Measurements of Variability

    To calculate σ, we substitute the total population size (N) for n in the above formula. For samples less than about 30, the appropriate correction factor for the denominator should be n–1; for sample sizes greater than this, it makes little difference in the value of s whether n or n–1 is used in the denominator.

    All other things being equal, the larger the sample size, the more accurately the statistic s should estimate the parameter. Calculators can be used to accumulate squared numbers. This usually makes it easier to calculate s by the equivalent formula

    It is the property of every normal distribution that approximately 2/3 of the measurements (68%) will lie within plus or minus one standard deviation from the mean (μ ± σ). Approximately 19/20 of the measurements (95%) will lie within two standard deviations of the mean (μ ± 2σ). More than 99% of the measurements will be found within plus or minus three standard deviations of the mean (μ ± 3σ).

    EXAMPLE 8.1 The mean height of a sample from a plant population is 56 in; the sample standard deviation is 6 in. This indicates that approximately 2/3 of the sample will be found between the values 56+6 = 50 in to 62 in. Approximately 25% of all plants in this sample will measure smaller than 56–(2 × 6)=56–12 = 44 in and 25% will measure larger than 56+(2 × 6)=68 in.

    The standard deviation can be plotted on a normal distribution by locating the point of inflection of the curve (point of maximum slope). A perpendicular constructed from the baseline that intersects the curve at this point is one standard deviation from the mean (Fig. 8-2).

    The Normal Distribution

  3. Coefficient of Variation. Traits with relatively large average metric values generally are expected to have correspondingly larger standard deviations than traits with relatively small average metric values. Furthermore, since different traits may be measured in different units, the coefficients of variation are useful for comparing their relative variabilities. Dividing the standard deviation by the mean renders the coefficient of variation independent of the units of measurement.
  4. Variance. The square of the standard deviation is called variance2). Unlike the standard deviation, however, variance cannot be plotted on the normal curve and can only be represented mathematically. Variance is widely used as an expression of variability because of the additive nature of its components. By a technique called "analysis of variance," the total phenotypic variance (σ2P) expressed by a given trait in a population can be statistically fragmented or partitioned into components of genetic variance (σ2G), nongenetic (or environmental) variance (σ2E), and variance due to genotype-environment interactions (σ2GE) Thus,
  5. It is beyond the scope of this text to present the analysis of variance, but a knowledge of variance components is essential to a discussion of breeding theory. Both the genetic variance and environmental variance can be further partitioned by this technique, so that the relative contributions of a number of factors influencing a quantitative trait can be ascertained. In order to simplify discussion, we shall ignore the interaction component.

    EXAMPLE 8.2 An analysis of variance performed on the birth weights of humans produced the following results:

  6. Variance Method of Estimating the Number of Genes. A population such as a line, a breed, a variety, a strain, a subspecies, etc., is composed of individuals that are more nearly alike in their genetic composition than those in the species as a whole. Phenotypic variability will usually be expressed, even in a group of organisms that are genetically identical. All such variability within pure lines is obviously environmental in origin. Crosses between two pure lines produce a genetically uniform hybrid F1. Phenotypic variability in the F1 is likewise nongenetic in origin. In the formation of the F2 generation, gene combinations are reshuffled into new combinations to the F2 individuals. It is a common observation that the F2 generation is much more variable than the F1 from which it was derived (Fig. 8-4).
  7. Measurements of Variability

    In a normally distributed trait, the means of the F1 and F2 populations tend to be intermediate between the means of the two parental lines. If there is no change in the environment from one generation to the next, then the environmental variation of the F2 should be approximately the same as that of the F1. An increase in phenotypic variance of the F2 over that of the F1 may then be attributed to genetic causes. Thus, the genotypic variance of the F22GF2) is equal to the phenotypic variance of the F22PF2) minus the phenotypic variance of the F12PF1):

    The genetic variance of the F2 is expressed by the formula σ2GF2 = (a2N)/2,where a is the contribution of each active allele and N is the number of pairs of genes involved in the quantitative trait. An estimate of a is obtained from the formula a = D/2N, where D is the numerical difference between the two parental means. Making substitutions and solving for N,

    from which

    This formula is an obvious oversimplification since it assumes that all genes are contributing cumulatively the same amount to the phenotype, and that there is no dominance, no linkage, and no interaction. Much more sophisticated formulas have been developed to take such factors into consideration, but these are beyond the scope of this book.

Practice problems for these concepts can be found at:

Add your own comment