Education.com
Try
Brainzy
Try
Plus

Estimation Help (page 2)

By — McGraw-Hill Professional
Updated on Aug 26, 2011

Sampling Distributions

Here's a problem we haven't yet considered. Suppose, in the bulb-testing scenario, our sample consists of 1000 randomly selected bulbs, and we get the results illustrated by Figs. 5-4 and 5-5. What if we repeat the experiment, again choosing a sample consisting of 1000 randomly selected bulbs? We won't get the same 1000 bulbs as we did the first time, so the results of the experiment will be a little different.

Estimating the Mean

Estimating the Standard Deviation

Suppose we do the experiment over and over. We'll get a different set of bulbs every time. The results of each experiment will be almost the same, but they will not be exactly the same. There will be a tiny variation in the estimate of the mean from one experiment to another. Likewise, there will be a tiny variation in the estimate of the standard deviation. This variation from experiment to experiment will be larger if the sample size is smaller (say 100 bulbs), and the variation will be smaller if the sample size is larger (say 10,000 bulbs).

Imagine that we repeat the experiment indefinitely, estimating the mean again and again. As we do this and plot the results, we obtain a distribution that shows how the mean varies from sample to sample. Figure 5-6 shows what this curve might look like. It is a normal distribution, but its values are much more closely clustered around 3.600 amperes. We might also plot a distribution that shows how the standard deviation varies from sample to sample. Again we get a normal distribution; its values are closely clustered around 0.23, as shown in Fig. 5-7.

Sampling Distributions

Sampling Distributions

Figures 5-6 and 5-7 are examples of what we call a sampling distribution. Figure 5-6 shows a sampling distribution of means. Figure 5-7 illustrates a sampling distribution of standard deviations. If our experiments involved the testing of more than 1000 bulbs, these distributions would be more centered (more sharply peaked curves), indicating less variability from experiment to experiment. If our experiments involved the testing of fewer than 1000 bulbs, the distributions would be less centered (flatter curves), indicating greater variability from experiment to experiment.

The Central Limit Theorem

Imagine a population P in which some characteristic x can vary from element (or individual) to element. Suppose P contains p elements, and p is a very large number. The value x is plotted on the horizontal axis of a graph, and the number y of individuals with characteristic value x is plotted on the vertical axis. The result is a statistical distribution. Maybe it's a normal distribution (bell-shaped and symmetrical), and maybe not. The number of elements p in the population P is so large that it's easiest to render the distribution as a smooth curve.

Now imagine that we choose a large number, k, of samples from P. Each sample represents a different random cross-section of P, but all the samples are the same size. Each of the k samples contains n elements, where n < p. We find the mean of each sample and compile all these means into a set {μ1, μ2, μ3, . . ., μk}. We then plot these means on a graph. We end up with a sampling distribution of means. We've been through this discussion with the example involving the light bulbs, and now we're stating it in general terms. We're repeating this concept because it leads to something important known as the Central Limit Theorem.

According to the first part of the Central Limit Theorem, the sampling distribution of means is a normal distribution if the distribution for P is normal. If the distribution for P is not normal, then the sampling distribution of means approaches a normal distribution as the sample size n increases. Even if the distribution for P is highly skewed (asymmetrical), any sampling distribution of means is more nearly normal than the distribution for P. It turns out that if n ≥ 30, then even if the distribution for P is highly skewed and p is gigantic, for all practical purposes the sampling distribution of means is a normal distribution.

The second part of the Central Limit Theorem concerns the standard deviation of the sampling distribution of means. Let σ be the standard deviation of the distribution for some population P. Let n be the size of the samples of P for which a sampling distribution of means is determined. Then the standard deviation of the sampling distribution of means, more often called the standard error of the mean (SE), can be found with the following formula:

SE ≈ σ/(n1/2)

That is, SE is approximately equal to the standard deviation of the distribution for P, divided by the square root of the number of elements in each sample. If the distribution for P is normal, or if n ≥ 30, then we can consider the formula exact:

SE = σ/(n1/2)

From this it can be seen that as the value of n increases, the value of SE decreases. This reflects the fact that large samples, in general, produce more accurate experimental results than small samples. This holds true up to about n = 30, beyond which there is essentially no further accuracy to be gained.

Practice problems for these concepts can be found at:

Sampling and Estimation Practice Test

View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
Ask
150 Characters allowed