Simple Linear Regression for AP Statistics

By — McGraw-Hill Professional
Updated on Feb 4, 2011

Practice problems for these concepts can be found at:

Let's distinguish between statistics and parameters. Statistics are measurements or values that describe samples, and parameters are measurements that describe populations. We have also seen that statistics can be used to estimate parameters. Thus, we have used to estimate the population mean μ, σ to estimate the population standard deviation s, etc. The least-squares regression line ( = a + bx), is based on a set of ordered pairs. is actually a statistic because it is based on sample data. In this chapter, we study the parameter, μy, that is estimated by .

Before we look at the model for linear regression, let's consider an example:

example: The following data are pulse rates and heights for a group of 10 female statistics students:

  1. What is the least-squares regression line for predicting pulse rate from height?
  2. What is the correlation coefficient between height and pulse rate? Interpret the correlation coefficient in the context of the problem.
  3. What is the predicted pulse rate of a 67" tall student?
  4. Interpret the slope of the regression line in the context of the problem.


  1. Pulse rate = 47.17 + 0.302 (Height). (Done on the TI-83/84 with Height in L1 and Pulse in L2, the LSRL can be found STAT CALC LinReg(a+bx) L1,L2,Y1.)
  2. r = 0.21. There is a weak, positive, linear relationship between Height and Pulse rate.
  3. Pulse rate = 47.17 + 0.302(67) = 67.4. (On the Ti-83/84: Y1(67) = 67.42. Remember that you can paste Y1 to the home screen by entering VARS Y-VARS Function Y1.)
  4. For each increase in height of one inch, the pulse rate is predicted to increase by 0.302 beats per minute (or: the pulse rate will increase, on average, by 0.302 beats per minute).

When doing inference for regression, we use = a + bx to estimate the true population regression line. Similar to what we have done with other statistics used for inference, we use a and b as estimators of population parameters a and b, the intercept and slope of the population regression line. The conditions necessary for doing inference for regression are:

  • For each given value of x, the values of the response variable y-values are independent and normally distributed.
  • For each given value of x, the standard deviation, σ, of y-values is the same.
  • The mean response of the y-values for the fixed values of x are linearly related by the equation μy = α + βx.

example: Consider a situation in which we are interested in how well a person scores on an agility test after a fixed number of 3-oz. glasses of wine. Let x be the number of glasses consumed. Let x take on the values 1, 2, 3, 4, 5, and 6. Let y be the score on the agility test (scale: 1–100). Then for any given value xi, there will be a distribution of y-values with mean μy1. The conditions for inference for regression are that (i) each of these distributions of y-values are normally distributed, (ii) each of these distributions of y-values has the same standard deviation σ, and (iii) each of the μy1 lies on a line.

Remember that a residual was the error involved when making a prediction from a regression equation (residual = actual value of y – predicted value of y = yii ). Not surprisingly, the standard error of the predictions is a function of the squared residuals:

s is an estimator of σ, the standard deviation of the residuals. Thus, there are actually three parameters to worry about in regression: α, β, and σ, which are estimated by a, b, and s, respectively.

The final statistic we need to do inference for regression is the standard error of the slope of the regression line:

In summary, inference for regression depends upon estimating μy = a + βx with = a + bx. For each x, the response values of y are independent and follow a normal distribution, each distribution having the same standard deviation. Inference for regression depends on the following statistics:

  • a, the estimate of the y intercept, α, of βy
  • b, the estimate of the slope, β, of μy
  • s, the standard error of the residuals
  • sb, the standard error of the slope of the regression line

In the section that follows, we explore inference for the slope of a regression line in terms of a significance test and a confidence interval for the slope.

Practice problems for these concepts can be found at:

Add your own comment

Ask a Question

Have questions about this article or topic? Ask
150 Characters allowed