**Normal Distribution**

The study of a quantitative trait in a large population usually reveals that very few individuals possess the extreme phenotypes and that progressively more individuals are found nearer the average value for that population. This type of symmetrical distribution is characteristically bell-shaped, as shown in Fig. 8-2, and is called a **normal distribution**. It is approximated by the **binomial distribution** (*p+q*)^{n} when the power of the binomial is very large and *p* and *q* are both 1/*n* or greater; *p* and *q* represent the probabilities of alternative independent events, *p* + *q* = 1.

**Average Measurements**

The average phenotypic value for a normally distributed trait is expressed as the **arithmetic mean** (read "X bar"). The arithmetic mean is the sum of the individual measurements (Σ *x*) divided by the number of individuals measured (*N*). The Greek letter "sigma" (Σ) directs the statistician to sum what follows.

It is usually not feasible to measure every individual in a population; therefore, measurements are usually made on a sample from that population in order to estimate the population value (parameter). If the sample is truly representative of the larger population of which it is a part, then the arithmetic mean will be an accurate estimate of the mean of the entire population (*μ*). Note that letters from the English alphabet are used to represent **statistics**, i.e., measurements derived from a sample, whereas Greek letters are used to represent **parameters**, i.e., attributes of the population from which the sample was drawn. Parameters are seldom known and must be estimated from results gained by sampling. Obviously, the larger the sample size, the more accurately the statistic estimates the parameter.

**Measurements of Variability**

**Standard Deviation**. Consider the three normally distributed populations shown in Fig. 8-3. Populations A and C have the same mean, but C is much more variable than A. Populations A and B have different means, but otherwise appear to have the same shape (dispersion). Therefore, in order to adequately define a normal distribution, we must know not only its mean but also how much variability exists. One of the most useful measures of variability in a population for genetic purposes is the**standard deviation**, symbolized by the lowercase Greek letter "sigma" (σ). A sample drawn from this population at random will have a sample standard deviation (s). To calculate s, the sample mean is subtracted (*X*_{i}– ) from each individual measurement (*X*_{i}) and the deviation is squared (*X*_{i}– )^{2}, summed over all individuals in the sample and divided by*n*–1, where*n*is the sample size. The calculation is completed by taking the square root of this value.**Coefficient of Variation.**Traits with relatively large average metric values generally are expected to have correspondingly larger standard deviations than traits with relatively small average metric values. Furthermore, since different traits may be measured in different units, the**coefficients of variation**are useful for comparing their relative variabilities. Dividing the standard deviation by the mean renders the coefficient of variation independent of the units of measurement.**Variance.**The square of the standard deviation is called**variance**(σ^{2}). Unlike the standard deviation, however, variance cannot be plotted on the normal curve and can only be represented mathematically. Variance is widely used as an expression of variability because of the additive nature of its components. By a technique called "analysis of variance," the total phenotypic variance (σ^{2}_{P}) expressed by a given trait in a population can be statistically fragmented or partitioned into components of genetic variance (σ^{2}_{G}), nongenetic (or environmental) variance (σ^{2}_{E}), and variance due to genotype-environment interactions (σ^{2}_{GE}) Thus,**Variance Method of Estimating the Number of Genes.**A population such as a line, a breed, a variety, a strain, a subspecies, etc., is composed of individuals that are more nearly alike in their genetic composition than those in the species as a whole. Phenotypic variability will usually be expressed, even in a group of organisms that are genetically identical. All such variability within pure lines is obviously environmental in origin. Crosses between two pure lines produce a genetically uniform hybrid F_{1}. Phenotypic variability in the F_{1}is likewise nongenetic in origin. In the formation of the F_{2}generation, gene combinations are reshuffled into new combinations to the F_{2}individuals. It is a common observation that the F_{2}generation is much more variable than the F_{1}from which it was derived (Fig. 8-4).

To calculate σ, we substitute the total population size *(N)* for *n* in the above formula. For samples less than about 30, the appropriate correction factor for the denominator should be *n*–1; for sample sizes greater than this, it makes little difference in the value of *s* whether *n* or *n*–1 is used in the denominator.

All other things being equal, the larger the sample size, the more accurately the statistic *s* should estimate the parameter. Calculators can be used to accumulate squared numbers. This usually makes it easier to calculate *s* by the equivalent formula

It is the property of every normal distribution that approximately 2/3 of the measurements (68%) will lie within plus or minus one standard deviation from the mean (μ ± σ). Approximately 19/20 of the measurements (95%) will lie within two standard deviations of the mean (μ ± 2σ). More than 99% of the measurements will be found within plus or minus three standard deviations of the mean (μ ± 3σ).

EXAMPLE 8.1The mean height of a sample from a plant population is 56 in; the sample standard deviation is 6 in. This indicates that approximately 2/3 of the sample will be found between the values 56+6 = 50 in to 62 in. Approximately 25% of all plants in this sample will measure smaller than 56–(2 × 6)=56–12 = 44 in and 25% will measure larger than 56+(2 × 6)=68 in.

The standard deviation can be plotted on a normal distribution by locating the point of inflection of the curve (point of maximum slope). A perpendicular constructed from the baseline that intersects the curve at this point is one standard deviation from the mean (Fig. 8-2).

It is beyond the scope of this text to present the analysis of variance, but a knowledge of variance components is essential to a discussion of breeding theory. Both the genetic variance and environmental variance can be further partitioned by this technique, so that the relative contributions of a number of factors influencing a quantitative trait can be ascertained. In order to simplify discussion, we shall ignore the interaction component.

**EXAMPLE 8.2** An analysis of variance performed on the birth weights of humans produced the following results:

In a normally distributed trait, the means of the F_{1} and F_{2} populations tend to be intermediate between the means of the two parental lines. If there is no change in the environment from one generation to the next, then the environmental variation of the F_{2} should be approximately the same as that of the F_{1}. An increase in phenotypic variance of the F_{2} over that of the F_{1} may then be attributed to genetic causes. Thus, the genotypic variance of the F_{2} (σ^{2}_{GF2}) is equal to the phenotypic variance of the F_{2} (σ^{2}_{PF2}) minus the phenotypic variance of the F_{1} (σ^{2}_{PF1}):

The genetic variance of the F_{2} is expressed by the formula σ^{2}_{GF2} = (*a*^{2}*N*)/2,where *a* is the contribution of each active allele and *N* is the number of pairs of genes involved in the quantitative trait. An estimate of *a* is obtained from the formula *a = D/2N*, where *D* is the numerical difference between the two parental means. Making substitutions and solving for *N*,

from which

This formula is an obvious oversimplification since it assumes that all genes are contributing cumulatively the same amount to the phenotype, and that there is no dominance, no linkage, and no interaction. Much more sophisticated formulas have been developed to take such factors into consideration, but these are beyond the scope of this book.

Practice problems for these concepts can be found at:

### Ask a Question

Have questions about this article or topic? Ask### Related Questions

See More Questions### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Definitions of Social Studies
- Grammar Lesson: Complete and Simple Predicates
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- How to Practice Preschool Letter and Name Writing
- Social Cognitive Theory
- Theories of Learning