Education.com
Try
Brainzy
Try
Plus

Normal Distribution for AP Statistics

based on 1 rating
By — McGraw-Hill Professional
Updated on Feb 5, 2011

Practice problems for these concepts can be found at:

We have been discussing characteristics of distributions (shape, center, spread) and of the individual terms (percentiles, z-scores) that make up those distributions. Certain distributions have particular interest for us in statistics, in particular those that are known to be symmetric and mound shaped. The following histogram represents the heights of 100 males whose average height is 70'' and whose standard deviation is 3''.

Normal Distribution

This is clearly approximately symmetric and mound shaped. We are going to model this with a curve that idealizes what we see in this sample of 100. That is, we will model this with a continuous curve that "describes" the shape of the distribution for very large samples. That curve is the graph of the normal distribution. A normal curve, when superimposed on the above histogram, looks like this:

Normal Distribution

The function that yields the normal curve is defined completely in terms of its mean and standard deviation. Although you are not required to know it, you might be interested to know that the function that defines the normal curve is:

.

One consequence of this definition is that the total area under the curve, and above the x-axis, is 1 (for you calculus students, this is because .

This fact will be of great use to us later when we consider areas under the normal curve as probabilities.

Empirical Rule

The empirical rule, or the 68-95-99.7 rule, states that approximately 68% of the terms in a normal distribution are within one standard deviation of the mean, 95% are within two standard deviation of the mean, and 99.7% are within three standard deviations of the mean. The following three graphs illustrate the empirical rule.

Empirical Rule

Standard Normal Distribution

Because we are dealing with a theoretical distribution, we will use μ and σ, rather than and s when referring to the normal curve. If X is a variable that has a normal distribution with mean μ and standard deviation σ (we say "X has N(μ,s)"), there is a related distribution we obtain by standardizing the data in the distribution to produce the standard normal distribution. To do this, we convert the data to a set of z-scores, using the formula

Empirical Rule.

The algebraic effect of this, as we saw earlier, is to produce a distribution of z-scores with mean 0 and standard deviation 1. Computing z-scores is just a linear transformation of the original data, which means that the transformed data will have the same shape as the original distribution. In this case then, the distribution of z-scores is normal. We say z has N(0,1). This simplifies the defining density function to

Empirical Rule.

For the standardized normal curve, the empirical rule says that approximately 68% of the terms lie between z = 1 and z = -1, 95% between z = -2 and z = 2, and 99.7% between z = -3 and z = 3. (Trivia for calculus students: one standard deviation from the mean is a point of inflection.)

Because many naturally occurring distributions are approximately normal (heights, SAT scores, for example), we are often interested in knowing what proportion of terms lie in a given interval under the normal curve. Problems of this sort can be solved either by use of a calculator or a table of Standard Normal Probabilities (Table A in this book). In a typical table, the marginal entries are z-scores, and the table entries are the areas under the curve to the left of a given z-score. All statistics texts have such tables.

Empirical Rule.

example: What proportion of the area under a normal curve lies to the left of z = -1.37?

solution: There are two ways to do this problem, and you should be able to do it either way.

  1. The first way is to use the table of Standard Normal Probabilities. To read the table, move down the left column (titled "z") until you come to the row whose entry is -1.3. The third digit, the 0.07 part, is found by reading across the top row until you come to the column whose entry is 0.07. The entry at the intersection of the row containing -1.3 and the column containing 0.07 is the area under the curve to the left of z = -1.37. That value is 0.0853.
  2. The second way is to use your calculator. It is the more accurate and more efficient way. In the DISTR menu, the second entry is normalcdf (see the next Calculator Tip for a full explanation of the normalpdf and normalcdf functions). The calculator syntax for a standard normal distribution is normalcdf (lower bound, upper bound). In this example, the lower bound can be any large negative number, say –100. normalcdf(-100,-1.37)= 0.0853435081.

Empirical Rule.

Empirical Rule.

example: What proportion of the area under a normal curve lies between z = -1.2 and z = 0.58?

solution: (i) Reading from Table A, the area to the left of z = -1.2 is 0.1151, and the area to the left of z = 0.58 is 0.7190. The geometry of the situation (see below) tells us that the area between the two values is 0.7190 - 0.1151 = 0.6039.

Empirical Rule.

example: In an earlier example, we saw that heights of men are approximately normally distributed with a mean of 70 and a standard deviation of 3. What proportion of men are more than 6' (72'') tall? Be sure to include a sketch of the situation.

solution:

  1. Another way to state this is to ask what proportion of terms in a normal distribution with mean 70 and standard deviation 3 are greater than 72. In order to use the table of Standard Normal Probabilities, we must first convert to z-scores. The z-score corresponding to a height of 72'' is
  2. Empirical Rule

  3. (Using the calculator, we have normalcdf (-1.2, 0.58) = 0.603973005. Round to 0.6040 (difference from the answer in part (i) caused by rounding).
  4. Empirical Rule.

The area to the left of z = 0.67 is 0.7486. However, we want the area to the right of 0.67, and that is 1 - 0.7486 = 0.2514.

ii.     Using the calculator, we have normalcdf (0.67,100) = 0.2514. We could get the answer from the raw data as follows: normalcdf (72,1000,70,3) = 0.2525, with the difference being due to rounding. (As explained in the last Calculator Tip, simply add the mean and standard deviation of a nonstandard normal curve to the list of parameters for normalcdf.)

Empirical Rule.

example: For the population of men in the previous example, how tall must a man be to be in the top 10% of all men in terms of height?

solution: This type of problem has a standard approach. The idea is to express zx in two different ways (which are, of course, equal since they are different ways of writing the z-score for the same point): (i) as a numerical value obtained from Table A or from your calculator and (ii) in terms of the definition of a z-score.

    Empirical Rule.
  1. We are looking for the value of x in the drawing. Look at Table A to find the nearest table entry equal to 0.90 (because we know an area, we need to read the table from the inside out to the margins). It is 0.8997 and corresponds to a z-score of 1.28.
  2. .

    A man would have to be at least 73.84'' tall to be in the top 10% of all men.

  3. Using the calculator, the z-score corresponding to an area of 90% to the left of x is given by invNorm(0.90) = 1.28. Otherwise, the solution is the same as is given in part (i). See the following Calculator Tip for a full explanation of the invNorm function.
View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
Ask
150 Characters allowed