Practice problems for these concepts can be found at:
As we proceed in statistics, our interest turns to estimating unknown population values. We have previously described a statistic as a value that describes a sample and a parameter as a value that describes a population. Now we want use a statistic as an estimate of a parameter. We know that if we draw multiple samples and compute some statistic of interest, say , that we will likely get different values each time even though the samples are all drawn from a population with a single mean, μ. What we now do is to develop a process by which we will use our estimate to generate a range of likely population values for the parameter. The statistic itself is called a point estimate, and the range of likely population values from which we might have obtained our estimate is called a confidence interval.
example: We do a sample survey and find that 42% of the sample plans to vote for Normajean for student body treasurer. That is, . Based on this, we generate an interval of likely values (the confidence interval) for the true proportion of students who will vote for Normajean and find that between 38% and 46% of the students are likely to vote for Normajean. The interval (0.38, 0.46) is a confidence interval for the true proportion who will vote for Normajean.
Note that saying a confidence interval is likely to contain the true population value is not to say that is necessarily does. It may or may not—we will see ways to quantify just how "confident" we are in our interval.
In this chapter, we will construct confidence intervals for a single mean, the difference between two means, a single proportion, and the difference between two proportions. Our ability to construct confidence intervals depends on our understanding of the sampling distributions for each of the parameters. Similar arguments exist for the sampling distributions of the difference between two means or the difference between two proportions.
t Procedures
In discussing the sampling distribution of , we assumed that we knew the population standard deviation. This is a big and questionable assumption because if we know the population standard deviation, we would probably also know the population mean and, if so, why are we calculating sample estimates of μ? What saves us, of course, is the central limit theorem, which tells us that the sampling distribution of is approximately normal when the sample size, n, is large enough (roughly, n ≥ 30). If the original population is approximately normal, or the sample size is "large," there are techniques similar to z-procedures for analyzing sampling distributions of sample means. In fact, some texts simply use z-procedures in this situation even though the population standard deviation is unknown. However, the resulting distribution is not normal and it's best to employ other procedures. In order to do this, we use the sample standard deviation s as an estimate of the population standard deviation σ. That is,
.
When we estimate a standard deviation from data, we call the estimator the standard error (some texts define the standard error as the standard deviation of the sampling distribution).
In this case, then,
.
is the standard error for
.
We will need the standard error for each different statistic we will use to generate a confidence interval. (A mnemonic device to remember what standard error stands for is: we are estimating the standard deviation but, because we are estimating it, there will probably be some error.) We will use this term from now on as we study inference because we will always be estimating the unknown standard deviation.
When n is small, we cannot safely assume the sampling distribution of is approximately normal. Under certain conditions (see below), the sampling distribution of follows a t distribution, which is similar in many respects to the normal distributions but which, because of the error involved in using s to estimate σ, is more variable. How much more variable depends on the sample size. The t statistic is given by
.
This statistic follows a t distribution if the following are true.
- The population from which the sample was drawn is approximately normal, or the sample is large enough (rule of thumb: n ≥ 30).
- The sample is a SRS from the population
There is a different t distribution for each n. The distribution is determined by the number of degrees of freedom, df = n – 1. We will use the symbol t(k) to identify the t distribution with k degrees of freedom.
As n increases, the t distribution gets closer to the normal distribution. We can see this in the following graphic
.
The table used for t values is set up differently than the table for z. In Table A, the marginal entries are z-scores, and the table entries are the corresponding areas under the normal curve to the left of z. In the t table, Table B, the left-hand column is degrees of freedom, the top margin gives upper tail probabilities, and the table entries are the corresponding
critical values of t required to achieve the probability. In this book, we will use t* (or z*) to indicate critical values.
example: For 12 df, and an upper tail probability of 0.05, we see that the critical value of t is 1.782 (t* = 1.782). For an upper tail probability of 0.02, the corresponding critical value is 2.303 (t = 2.303).
example: For 1000 df, the critical value of t for an upper tail probability of 0.025 is 1.962 (t* = 1.962). This is very close to the critical z-value for an upper tail probability of 0.025, which is 1.96 (z* = 1.96).
.
General Form of a Confidence Interval
A confidence interval is composed of two parts: an estimate of a population value and a margin of error. We identify confidence intervals by how confident we are that they contain the true population value.
A level C confidence interval has the following form: (estimate) ± (margin of error). In turn, the margin of error for a confidence interval is composed of two parts: the criticalvalue of z or t (which depends on the confidence level C) and the standard error. Hence, all confidence intervals take the form: (estimate) = (margin of error) = (estimate) ± (critical value)(standard error).
A t confidence interval for μ would take the form:
t* is dependent on C, the confidence level; s is the sample standard deviation; and n is the sample size.
The confidence level is often expressed as a percent: a 95% confidence interval means that C = 0.95, or a 99% confidence interval means that C = 0.99. Although any value of C can be used as a confidence level, typical levels are 0.90, 0.95, and 0.99.
IMPORTANT: When we say that "We are 95% confident that the true population value lies in an interval," we mean that the process used to generate the interval will capture the true population value 95% of the time. We are not making any probability statement about the interval. Our "confidence" is in the process that generated the interval. We do not know whether the interval we have constructed contains the true population value or not—it either does or it doesn't. All we know for sure is that, on average, 95% of the intervals so constructed will contain the true value.
.
example: Floyd told Betty that the probability was 0.95 that the 95% confidence interval he had constructed contained the mean of the population. Betty corrected him by saying that his interval either does contain the value (P = 1) or it doesn't (P = 0). This interval could be one of the 95 out of every 100 on average that does contain the population mean, or it might be one out of the 5 out of every 100 that does not. Remember that probability values apply to the expected relative frequency of future events, not events that have already occurred.
example: Find the critical value of t required to construct a 99% confidence interval for a population mean based on a sample size of 15.
solution: To use the t distribution table (Table B in the Appendix), we need to know the upper-tail probability. Because C = 0.99, and confidence intervals are two sided, the upper-tail probability is
.
Looking in the row for df = 15 – 1 = 14, and the column for 0.005, we find t* = 2.977. Note that the table is set up so that if you look at the bottom of the table and find 99%, you are in the same column.
Using the newer version of the TI-84, the solution is given by invT(0.995,14) = 2.977.
example: Find the critical value of z required to construct a 95% confidence interval for a population proportion.
solution: We are reading from Table A, the table of Standard Normal Probabilities. Remember that table entries are areas to the left of a given z-score. With C = 0.95, we want
in each tail, or 0.975 to the left if z*. Finding 0.975 in the table, we have z* = 1.96. On the TI-83/84, the solution is given by invNorm(0.975) = 1.960.
Practice problems for these concepts can be found at: