Sampling Distributions and the t-Distribution Study Guide (page 2)
Introduction to Sampling Distributions and the t-Distribution Study Guide
The distribution of a random sample from a population is called the sample distribution. Using sample values, we can obtain estimates of the population parameters, such as the mean, the standard deviation, or a proportion. If we take another sample, it is very likely that the estimates from the second sample will differ from those of the first. Fortunately, this variation among estimates from different samples is predictable. The key to understanding this variation lies in gaining an understanding of sampling distributions (yes, these are different from sample distributions). Intuitively, it seems that as the sample size increases, we should do better. Sampling distributions in this lesson and the Law of Large Numbers in the next give us insight into what is meant by better.
Sampling Distributions Defined
We learned that a parameter is a summary measure of a population, and a statistic is a summary measure of a sample. A statistic is some function of sample values that does not involve any unknown quantities (such as parameters).An important statistical idea is that parameters are fixed, but generally unknown, and that statistics are known from the sample, but vary. It is critical to always keep in mind whether we are thinking about a parameter or a statistic.
Suppose we are interested in the mean arm span of females attending college in the United States. (The arm span is the distance from the tip of one middle finger to the tip of the other middle finger when the arms are fully extended to the sides and perpendicular to the body.) Ten different people independently estimate the mean arm span by taking a random sample of 20 U.S. college females and finding the sample mean of the arm spans. It is very likely that this will lead to ten different estimates of the population mean arm span. Because each person selected a different sample, the arm spans of the people within each sample, and thus the sample means, will tend to be different. We have ten observations from the sampling distribution of the sample mean, one from each sample. The sampling distribution of a statistic is the distribution of possible values of that statistic for repeated samples of the same size from a population. That is, the statistic, the sample mean in our example, is a random variable, and the sampling distribution is the distribution for that random variable. Fortunately, we know quite a bit about the sampling distributions of the statistics that we will be most interested in.
For each of the following circumstances, explain whether the quantity in bold is a parameter or a statistic.
- For a sample of 20 married couples, there was a difference of 2.5 inches in the mean heights of the husbands and wives.
- The Census Bureau reported that, based on the 2000 Census, the median age of residents of Oklahoma was 35.3 years.
- The 2.5 inches is a statistic because it is based on a sample from the population of all married couples.
- The 35.3 years is a parameter. A census occurs when every unit in the population is contacted. The U.S. Census occurs every ten years and attempts to contact everyone in the United States.
For each of the following, specify whether or μ is the correct statistical notation for each mean described.
- A university administrator determines that the mean GPA of all students at the school is 2.4.
- The mean amount of money spent on groceries for one week was $80.40 for a random sample of 35 single adult men.
- Because all members of the population of interest (all students at the university) were included in finding the mean, the appropriate statistical notation is μ.
- Here, we have the sample mean , which is an estimate of the mean amount spent by all single adult men (the population).
Sampling Distribution of the Sample Mean
Consider again measuring the arm span of 20 randomly selected college females. Suppose arm spans of college females are normally distributed with a mean of 65.5 inches and a standard deviation of 2.5 inches. Thus, 68% of the population values will be between 63 and 68 inches, and 95% will be between 60.5 and 70.5 inches.
Suppose we take repeated samples of size 20. The samples, and thus the sample means, tend to vary with sample. Following are three samples that could have been drawn, where all measurements are in inches.
Sample 1: 67.7, 62.3, 66.6, 67.0, 62.3, 66.1, 61.2, 65.8, 67.0, 64.6, 60.4, 64.8, 66.7, 63.9, 65.2, 68.9, 63.1, 66.0, 67.3, 71.1 Sample 2: 67.5, 66.3, 66.7, 64.8, 66.5, 64.6, 68.5, 62.5, 64.9, 64.1, 68.9, 68.2, 62.9, 63.3, 64.5, 66.2, 66.4, 65.7, 66.8, 64.7 Sample 3: 63.9, 65.1, 68.7, 67.2, 60.2, 63.4, 63.9, 63.6, 68.0, 65.5, 65.6, 62.6, 63.9, 68.6, 65.6, 63.5, 66.3, 59.9, 67.9, 68.0
The sample means for samples 1, 2, and 3 are 65.4, 65.7, and 65.07 inches, respectively. None is exactly equal to the population mean (65.5 inches) or to each other, but all were fairly close.
This process of selecting samples of size 20 was repeated 10,000 times using computer simulation, and the histogram of the simulated sampling distribution of sample means is presented in Figure 12.1. In addition, the normal distribution that best fits the data is superimposed on the histogram. Notice that the normal curve provides a good representation of the histogram. The sample mean of the 10,000 sample means (the average of the sample means from all 10,000 samples of size 20) is 65.02, very close to the population mean of 65.5, and the sample standard deviation is 0.559. In addition, almost all of the values, not just 68% of them, are between 63 and 68 inches.
The sample standard deviation of the sample means is much smaller than the population standard deviation. Why? The population standard deviation is a measure of the spread in the arm span measurements of college females; it is a measure of the spread in the distribution of the random variable X, the arm span of a college female measured in inches. The sample standard deviation of the sample means based on samples of size 20 is a measure of the spread of the sample means, not the individuals; it is a measure of the spread in the sampling distribution of the random variable , the sample mean of the arm spans of 20 college females measured in inches.
What would happen if the sample size changed? To find out, we simulated 10,000 samples of size 50 from the same normal distribution with a mean of 65.5 and a standard deviation of 2.5. The sample mean was computed for each sample, and a histogram of this simulated sampling distribution is shown in Figure 12.2. Although the two graphs look similar at first glance, notice that the range of the sample means is quite different. For this set of 10,000 sample means, the average of the sample means is 65.494, and the sample standard deviation is 0.353, which is substantially less than the sample standard deviation of the sample means when the sample size was 20.
If a random sample of size n is taken from a normal distribution with mean μ and a standard deviation σ the sampling distribution of the sample mean is normal with a mean μ and a standard deviation .That is, the sampling distribution of is centered at the same value as the population distribution, but it has less spread. The spread in the sampling distribution, measured by , decreases as the sample size increases. When we simulated the sampling distribution by generating 10,000 samples using samples of size n = 20, the sample standard deviation of the sampling distribution of was 0.559. We believe that it should be = 0.55901. (Here, we carried more decimal places than usual to aid the comparison.) These are really close! What about when n= 50? We have = 0.35355. Again, there is very good agreement with the simulated sampling distribution! This does not prove the statements are true; such a proof requires methods beyond this book. It does help us feel comfortable that the formula works.
Notice how closely the average of the 10,000 values was to the population mean μ for samples of size n = 20 and n = 50, just as was predicted. The fact that the mean of the sampling distribution of the sample means is equal to the population mean indicates that the sample mean is an unbiased estimator of the population mean μ. In general, an unbiased statistic is a statistic with mean value equal to the value of the population characteristic being estimated. We usually want to use an unbiased statistic to estimate a population parameter of interest.
If has a normal distribution with mean μ and standard deviation , by the properties of the normal distribution, we know that has a standard normal distribution. That is, the sampling distribution of z is a standard normal. Thus, we can use the standard normal tables to find probabilities that the sample mean falls in intervals of interest, such as the probability the sample mean exceeds a specified value or is between two values, just as we did in Lesson 11.
- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Grammar Lesson: Complete and Simple Predicates
- Definitions of Social Studies
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- Social Cognitive Theory
- How to Practice Preschool Letter and Name Writing
- Theories of Learning