Sampling Distributions and the t-Distribution Study Guide (page 4)
Introduction to Sampling Distributions and the t-Distribution Study Guide
The distribution of a random sample from a population is called the sample distribution. Using sample values, we can obtain estimates of the population parameters, such as the mean, the standard deviation, or a proportion. If we take another sample, it is very likely that the estimates from the second sample will differ from those of the first. Fortunately, this variation among estimates from different samples is predictable. The key to understanding this variation lies in gaining an understanding of sampling distributions (yes, these are different from sample distributions). Intuitively, it seems that as the sample size increases, we should do better. Sampling distributions in this lesson and the Law of Large Numbers in the next give us insight into what is meant by better.
Sampling Distributions Defined
We learned that a parameter is a summary measure of a population, and a statistic is a summary measure of a sample. A statistic is some function of sample values that does not involve any unknown quantities (such as parameters).An important statistical idea is that parameters are fixed, but generally unknown, and that statistics are known from the sample, but vary. It is critical to always keep in mind whether we are thinking about a parameter or a statistic.
Suppose we are interested in the mean arm span of females attending college in the United States. (The arm span is the distance from the tip of one middle finger to the tip of the other middle finger when the arms are fully extended to the sides and perpendicular to the body.) Ten different people independently estimate the mean arm span by taking a random sample of 20 U.S. college females and finding the sample mean of the arm spans. It is very likely that this will lead to ten different estimates of the population mean arm span. Because each person selected a different sample, the arm spans of the people within each sample, and thus the sample means, will tend to be different. We have ten observations from the sampling distribution of the sample mean, one from each sample. The sampling distribution of a statistic is the distribution of possible values of that statistic for repeated samples of the same size from a population. That is, the statistic, the sample mean in our example, is a random variable, and the sampling distribution is the distribution for that random variable. Fortunately, we know quite a bit about the sampling distributions of the statistics that we will be most interested in.
For each of the following circumstances, explain whether the quantity in bold is a parameter or a statistic.
- For a sample of 20 married couples, there was a difference of 2.5 inches in the mean heights of the husbands and wives.
- The Census Bureau reported that, based on the 2000 Census, the median age of residents of Oklahoma was 35.3 years.
- The 2.5 inches is a statistic because it is based on a sample from the population of all married couples.
- The 35.3 years is a parameter. A census occurs when every unit in the population is contacted. The U.S. Census occurs every ten years and attempts to contact everyone in the United States.
For each of the following, specify whether or μ is the correct statistical notation for each mean described.
- A university administrator determines that the mean GPA of all students at the school is 2.4.
- The mean amount of money spent on groceries for one week was $80.40 for a random sample of 35 single adult men.
- Because all members of the population of interest (all students at the university) were included in finding the mean, the appropriate statistical notation is μ.
- Here, we have the sample mean , which is an estimate of the mean amount spent by all single adult men (the population).
Sampling Distribution of the Sample Mean
Consider again measuring the arm span of 20 randomly selected college females. Suppose arm spans of college females are normally distributed with a mean of 65.5 inches and a standard deviation of 2.5 inches. Thus, 68% of the population values will be between 63 and 68 inches, and 95% will be between 60.5 and 70.5 inches.
Suppose we take repeated samples of size 20. The samples, and thus the sample means, tend to vary with sample. Following are three samples that could have been drawn, where all measurements are in inches.
Sample 1: 67.7, 62.3, 66.6, 67.0, 62.3, 66.1, 61.2, 65.8, 67.0, 64.6, 60.4, 64.8, 66.7, 63.9, 65.2, 68.9, 63.1, 66.0, 67.3, 71.1 Sample 2: 67.5, 66.3, 66.7, 64.8, 66.5, 64.6, 68.5, 62.5, 64.9, 64.1, 68.9, 68.2, 62.9, 63.3, 64.5, 66.2, 66.4, 65.7, 66.8, 64.7 Sample 3: 63.9, 65.1, 68.7, 67.2, 60.2, 63.4, 63.9, 63.6, 68.0, 65.5, 65.6, 62.6, 63.9, 68.6, 65.6, 63.5, 66.3, 59.9, 67.9, 68.0
The sample means for samples 1, 2, and 3 are 65.4, 65.7, and 65.07 inches, respectively. None is exactly equal to the population mean (65.5 inches) or to each other, but all were fairly close.
This process of selecting samples of size 20 was repeated 10,000 times using computer simulation, and the histogram of the simulated sampling distribution of sample means is presented in Figure 12.1. In addition, the normal distribution that best fits the data is superimposed on the histogram. Notice that the normal curve provides a good representation of the histogram. The sample mean of the 10,000 sample means (the average of the sample means from all 10,000 samples of size 20) is 65.02, very close to the population mean of 65.5, and the sample standard deviation is 0.559. In addition, almost all of the values, not just 68% of them, are between 63 and 68 inches.
The sample standard deviation of the sample means is much smaller than the population standard deviation. Why? The population standard deviation is a measure of the spread in the arm span measurements of college females; it is a measure of the spread in the distribution of the random variable X, the arm span of a college female measured in inches. The sample standard deviation of the sample means based on samples of size 20 is a measure of the spread of the sample means, not the individuals; it is a measure of the spread in the sampling distribution of the random variable , the sample mean of the arm spans of 20 college females measured in inches.
What would happen if the sample size changed? To find out, we simulated 10,000 samples of size 50 from the same normal distribution with a mean of 65.5 and a standard deviation of 2.5. The sample mean was computed for each sample, and a histogram of this simulated sampling distribution is shown in Figure 12.2. Although the two graphs look similar at first glance, notice that the range of the sample means is quite different. For this set of 10,000 sample means, the average of the sample means is 65.494, and the sample standard deviation is 0.353, which is substantially less than the sample standard deviation of the sample means when the sample size was 20.
If a random sample of size n is taken from a normal distribution with mean μ and a standard deviation σ the sampling distribution of the sample mean is normal with a mean μ and a standard deviation .That is, the sampling distribution of is centered at the same value as the population distribution, but it has less spread. The spread in the sampling distribution, measured by , decreases as the sample size increases. When we simulated the sampling distribution by generating 10,000 samples using samples of size n = 20, the sample standard deviation of the sampling distribution of was 0.559. We believe that it should be = 0.55901. (Here, we carried more decimal places than usual to aid the comparison.) These are really close! What about when n= 50? We have = 0.35355. Again, there is very good agreement with the simulated sampling distribution! This does not prove the statements are true; such a proof requires methods beyond this book. It does help us feel comfortable that the formula works.
Notice how closely the average of the 10,000 values was to the population mean μ for samples of size n = 20 and n = 50, just as was predicted. The fact that the mean of the sampling distribution of the sample means is equal to the population mean indicates that the sample mean is an unbiased estimator of the population mean μ. In general, an unbiased statistic is a statistic with mean value equal to the value of the population characteristic being estimated. We usually want to use an unbiased statistic to estimate a population parameter of interest.
If has a normal distribution with mean μ and standard deviation , by the properties of the normal distribution, we know that has a standard normal distribution. That is, the sampling distribution of z is a standard normal. Thus, we can use the standard normal tables to find probabilities that the sample mean falls in intervals of interest, such as the probability the sample mean exceeds a specified value or is between two values, just as we did in Lesson 11.
In a large population of high school students, the number of hours spent studying during any given week is normally distributed with mean 4.5 hours and standard deviation 18.9 hours. Consider randomly selected samples of size n = 100 students.
- What is the mean of the sampling distribution of the sample means?
- What is the standard deviation of the sampling distribution of the sample means?
- Use properties of the normal distribution to fill in the blanks in the following sentence: "For 68% of all randomly selected samples of size n = 100 students, the mean amount of time spent studying during a week will be between___and___hours."
- The mean of the sampling distribution of the sample means is 4.5 hours, the same as the population mean.
- The standard deviation of the sampling distribution of the sample means is = 1.89 hours.
- Because the population of the numbers of hours the high school students spent studying during the previous week is normally distributed, the distribution of the sample means is also normal. For a normal distribution, 68% of the population lies within one standard deviation of the mean. Thus, 68% of the sample means would lie within 1.89 hours (the standard deviation of the sampling distribution of sample means) of 4.5 hours (the mean of the sampling distribution of sample means). Thus, in repeated samples of n = 100, 68% of the samples will estimate the mean amount of time high school students spent studying during the previous week to be between 2.6 and 6.4 hours.
Again, assume that a random sample has been drawn from a normal distribution. The sampling distribution of is normal. If we know the mean and standard deviation of the population, then we can answer many questions about and the values it may assume. In practice, s is almost never known. Most statisticians have never had a real-life problem in which the standard deviation was known! (The same is true for the mean, but we will address this issue later.) Assuming that σ was known, we know that has a standard normal distribution. If we use the sample standard deviation instead of σ then a different standardized random variable, denoted by t, results in the following:
When working with z, only one quantity is varying with each sample, . For t, two quantities are varying, and s. The value of s may not be very close to σ, especially for small values of n. Consequently, the distribution of t tends to be more variable than the distribution of z, especially for small n.
The t-distribution is centered at zero. It has one parameter, called the degrees of freedom, abbreviated as df. The degrees of freedom are usually a function of the sample size n, but the exact relationship between df and n, depends on the type of problem. For this particular application we are considering The degrees of freedom are (n – 1), one less than the sample size. The t-distribution is bell shaped and looks much like the standard normal, but it has thicker tails. The thicker tails reflect the increased variability for a t-distribution compared to the standard normal distribution. As the df increase, the tails become less thick and the distribution becomes more like the normal. The normal distribution and the t-distributions with 4 and 10 degrees of freedom are displayed in Figure 12.3.
Like the normal, it is not easy to compute probabilities or to find specific values of t from the t-distribution. We must again rely on tabled values, calculators, or computers. For specified degrees of freedom, the tabulated t* in the body of the table is chosen so that P(t > t*) α . Notice that in the normal table we first used in the previous lesson, we worked with the left-tail probabilities. Here, we have the right-tail probabilities. Also, for the normal distribution, the probabilities were in the body of the table. For the t-distribution, the t* values are in the body of the table (see Table 12.1).
- Find t* such that P(t > t*) = 0.05 when t has 8 degrees of freedom.
- Find t* such that P(|t| > t*) = 0.05 when t has 12 degrees of freedom.
- Find t* such that P(t < t*) = 0.05 when t has 16 degrees of freedom.
- Here, we want the right-tail probability to be 0.05. Right-tail probabilities are presented in the table. To find the proper value, look at the row with 8 df and the column headed by 0.05. The two intersect at t* = 1.86.
- We should first recall from algebra that P(|t| > t*) = P(t > t*) + P(t<–t*). Because the t-distribution is symmetric, P(t > t*) = P(t<–t*). Thus, we want to find t* such that P(t> t*) = = 0.025. In the t-table, the row with 12 degrees of freedom intersects with the 0.025 column at t* = 2.179. Thus, P(t > 2.179) = 0.025. By symmetry, we also have P(t<–2.179) = 0.025. This gives us that P(|t| > t*) = 0.05.
- This time, we are looking for a left-tail probability. We begin by finding the t* that would provide the same size right-tail probability; that is, find t* such that P(t > t*) = 0.05. At the intersection of the row for 16 degrees of freedom and the 0.05 column, we find 1.746. By symmetry, the left-tail probability would be 0.05 for t* = –1.746. Therefore, here t* = –1.746.
Throughout this lesson, we have talked about the "sample standard deviation of the sampling distribution of the sample mean. "We have done this to emphasize that we are trying to estimate the standard deviation, not of the original population, but of the conceptual population of the sample means that could be derived from repeatedly taking random samples of size n and finding for each. Because the variability in , , and other estimators is so important, the term standard error is used to represent this idea. That is, the standard error of a statistic is the estimated standard deviation of the statistic. Thus, the "sample standard deviation of the sampling distribution of the sample mean" may be simply stated as the "standard error of ".The standard error of the sample proportion is For a sample mean, the standard deviation of is if σ, is known so there would be no need to estimate that quantity using the standard error.However, in practice, one rarely, if ever, knows σ, so this case will not be considered further. If the mean is unknown, the standard deviation is generally unknown so the sample standard deviation is used to estimate it. Thus, when σ is unknown, the standard error of is. Notice that as the sample size increases, the standard error decreases.
Sampling Distributions and the t-Distribution Study Guide In Short
Each sample produces a sample mean. If we selected another sample, the observations in the sample, and thus the value of the sample mean, are likely to change. When sampling from a normal distribution, the sampling distribution of the sample mean is also normal with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of n. The sample mean can be standardized using is if the population standard deviation is known and . Because the sample standard deviation of the sampling distribution of the sample mean is frequently used, it is simply referred to as the standard .
Find practice problems and solutions for these concepts at Sampling Distributions and the t-Distribution Practice Exercises.
Today on Education.com
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- 10 Fun Activities for Children with Autism
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- The Homework Debate
- Social Cognitive Theory
- First Grade Sight Words List
- GED Math Practice Test 1