Continuous Probability Distributions Study Guide (page 3)
Introduction to Continuous Probability Distributions
As was the case with discrete distributions, some continuous random variables are of particular interest. In this lesson, we will discuss two of these: the uniform distribution and the normal distribution. The normal distribution is particularly important because many of the methods used in statistics are based on this distribution. The reasons for this will become clearer as we work through the rest of the lessons.
In the first lesson, we learned that a continuous random variable has a set of possible values that is an interval on the number line. It is not possible to assign a probability to each point in the interval and still satisfy the conditions of probability set forth in Lesson 10 for discrete random variables. Instead, the probability distribution of a continuous random variable X is specified by a mathematical function f(x) called the probability density function or just density function. The graph of a density function is a smooth curve.A probability density function (pdf) must satisfy two conditions: (1) f(x) ≥ 0 for all real values of x and (2) the total area under the density curve is equal to 1. The graphs of three density functions are shown in Figure 11.1.
The probability that X lies in any particular interval is shown by the area under the density curve and above the interval. The following three events are frequently encountered: (1) X < a, the event that the random variable X assumes a value less than a; (2) a < X < b, the event that the random variable X assumes a value between a and b; and (3) X > b, the event that the random variable X is greater than b. We say that we are interested in the lower tail probability for (1) and the upper tail probability when using (3). The areas associated with each of these are shown in Figure 11.2.
Notice that the probability that a < X < b may be computed using tail probabilities:
P(a < X < b) = P(X < b) – P(X < a).
If the random variable X is equally likely to assume any value in an interval (a, b), then X is a uniform random variable. The pdf is flat and is above the x-axis between a and b, and it is 0 outside of the interval. The height of the curve must be such that the area under the density and above the x-axis is 1. Because this region is a rectangle, the area is the height times the width of the interval, which is b – a. Thus, the height must be ; that is, the pdf of a uniform random variable has the form
= 0, otherwise.
A graph of the pdf is shown in Figure 11.3.
A group of volcanologists (people who study volcanoes) has been monitoring a volcano's seismicity, or the frequency and distribution of underlying earthquakes. Based on these readings, they believe that the volcano will erupt within the next 24 hours, but the eruption is equally likely to occur any time within that period. What is the probability that it will erupt within the next eight hours?
Define X = the time until the eruption of the volcano. X has positive probability over the interval (0,24) because the volcano will erupt during that time interval. Because the length of the interval is 24 – 0 = 24, the height of the density curve must be for the area under the density and above the x-axis to be one. That is, the pdf is
= 0, otherwise.
The probability that the volcano will erupt within the next eight hours is equal to the area under the curve and above the interval (0,8) as shown in Figure 11.4. This area is .
In the previous example, notice that the area is the same whether we have P(0 < X < 8) or P(0 ≤ X < 8) or P(0 < X ≤ 8) or P(0 ≤ X ≤ 8). Unlike discrete random variables, whether the inequality is strict or not, the probability is the same for the continuous random variables. This also correctly implies that, for continuous random variables, the probability that the random variable equals a specific value is 0.
Normal Probability Distributions
Normal probability distributions are continuous probability distributions that are bell shaped and symmetric. They are also known as Gaussian distributions or bell-shaped curves.
The normal distribution is perhaps the most widely used probability distribution, largely because it provides a reasonable approximation to the distribution of many random variables. It also plays a central role in many of the statistical methods that will be discussed in later lessons. Normal probability distributions are continuous probability distributions that are bell shaped and symmetric as displayed in Figure 11.5. The distribution is also called the Gaussian distribution or the bell-shaped curve.
The normal distribution has two parameters: the mean μ and the standard deviation σ. The notation X ~ N(μ ,σ) means that "X is normally distributed with a mean of μ and a standard deviation of σ". The distribution is symmetric about the mean. The mean, median, and mode are all equal. The mean is often referred to as the location parameter because it determines where the distribution is centered. The standard deviation determines the spread of the distribution. The effect of the mean and standard deviation on the normal distribution is displayed in Figure 11.6.
For any normal distribution, about 68% of the observations are within one standard deviation of the mean. About 95% and 99.7% of the observations are, respectively, within two and three standard deviations of the mean.
It is important to remember that, although the location and spread may change, the area under the curve and above the x-axis is always 1. Unfortunately, the probabilities associated with intervals cannot be computed easily as with the uniform distribution. To overcome this difficulty, we rely on a table of areas for a reference of normal distribution called the standard normal distribution. The standard normal distribution is the normal distribution with μ = 0 and σ = 1. It is customary to use the letter z to represent a standard normal random variable.
We will first learn to compute probabilities for a standard normal random variable and then learn how to find them for any random variable. We will also want to be able to determine extreme values of z, such as the value that only 5% of the population exceeds or the value that 1% of the population is less than. To find either probabilities or extreme values, we need a table of standard normal curve areas, or we need a calculator or computer that can be used to find these values. Here, we will restrict ourselves to the use of tables. The standard normal table used here in Table 11.1 tabulates the probability of observing a value less than or equal to z (see Figure 11.7).
Graphs are extremely useful tools to help us understand what values we are searching for. We will do this for each problem we work.
Examples of Continuous Probability Distributions
Below are nine examples of continuous probability distributions problems and solutions.
Find P(z < 1.32).
Using the standard normal table, we find the row with 1.3 in the z column and move along that row to the 0.02 column to find 0.9066. Thus, P(z < 1.32) = 0.9066. Figure 11.8 shows the graphic image of this.
Find P(z > 1.32).
From the table, we find P(z < 1.32) as we did in the previous example. Using some of the ideas of probability we learned earlier, we have P(z > 0.32) = 1 –P(z ≤ 1.32) = 1 – 0.9066 = 0.0934. See Figure 11.9.
Find P(z < –0.5).
There are no negative z-values in the table, so we cannot look this up directly. Instead, we use the symmetry of the normal distribution to find the probability (see Figure 11.10). That is,
|P(z < –0.5)||=P(z > 0.5)|
|= 1 – P(z < 0.5)|
|= 1 – 0.6915|
Find P(–1.45 < z < 0.76).
Figure 11.11 shows the solution.
First, we notice that P(–1.45 < z < 0.76) = P(z < 0.76) – P(z < –1.45). Now P(z < 0.76) can be found directly from the table to be 0.7764. Using the symmetry of the normal distribution again, P(z < –1.45) = P(z > 1.45) = 1 – P(z ≤ 1.45) = 1 – 0.9265 = 0.0735. Finally, P(–1.45 < z < 0.76) = P(z < 0.76) – P(z < –1.45) = 0.7764 – 0.0735 = 0.7029.
Find the value z* such that P(z < z*) = 0.75.
This is different from the other problems we have considered. Instead of finding a probability, we are looking for a z-value. However, the same table will allow us to solve the problem. The difference is that we will look in the table for a probability and then find the z-value associated with the probability. Looking in the body of the table, we find the values 0.7486 and 0.7517, which are the closest to the 0.75 of interest. By looking at the corresponding row and column headings, we find that P(z < 0.67) = 0.7486 and P(z < –0.68) = 0.7517. Because 0.7486 is closer to 0.75 than 0.7517, we take z* = 0.67. (Note: We could interpolate to find a more precise value of z*, but we will not go through this process here.) See Figure 11.12.
Find the value z* such that P(z > z*) = 0.05.
We need to have the probabilities in the form P(z < z*) to use the table. However, P(z > z*) = 1 – P(z ≤ z*). We can rewrite this as P(z ≤ z*) = 1 – P(z > z*) = 1 – 0.05 = 0.95. That is, if 5% of the population values are greater than z*, then 95% of the population values must be less than or equal to z*. Thus, we look for 0.95 in the body of the table and find 0.9495 and 0.9505 corresponding to z = 1.64 and z = 1.65, respectively. Because 0.95 is exactly halfway between 0.9495 and 0.9505, we have z* = 1.645. (This is the only time we don't just round to the closest value.) See Figure 11.13.
Find the value z* such that P(z < z*) = 0.01.
Because the standard normal is symmetric about its mean 0, we know P(z < 0) = 0.5, we know that z* must be less than 0. Also, because we have only positive values of z in the table, we cannot look for 0.01 directly in the table. However, again because of symmetry, we know that, if P(z < z*) = 0.01, then P(z > –z*) = 0.01. To use the table, we must find P(z≤ –z*) =1 – P(z > –z*) = 1 – 0.01 = 0.99. Looking in the body of the table, we find 0.9898 and 0.9901, corresponding to z = 2.32 and z = 2.33, respectively, to be the closest to 0.99. Because 0.9901 is the closer of the two to 2.33, we find z* = –2.33. See Figure 11.14.
Few normal random variables actually have a standard normal distribution. However, any normal random variable can be transformed to a standard normal, and any standard normal random variable can be transformed to a normal random variable with any mean μ and standard deviation σ. Specifically, if X ~ N(μ,σ)), . Further, if z ~ N(0,1), then X = μ + σz ~ N(μ,σ). Using these relationships, we can find probabilities and extreme values for any normal random variable using the z-table. When doing this, it is important to do all calculations carefully.