Practice problems for these concepts can be found at:

- Two-Variable Data Analysis Multiple Choice Practice Problems for AP Statistics
- Two-Variable Data Analysis Free Response Practice Problems for AP Statistics
- Two-Variable Data Analysis Review Problems for AP Statistics
- Two-Variable Data Analysis Rapid Review for AP Statistics

When we discussed correlation, we learned that it didn't matter which variable we called *x* and which variable we called *y*—the correlation *r* is the same. That is, there is no explanatory and response variable, just two variables that may or may not vary linearly. In this section we will be more interested in predicting, once we've determined the strength of the linear relationship between the two variables, the value of one variable (the response) based on the value of the other variable (the explanatory). In this situation, called linear regression, it matters greatly which variable we call *x* and which one we call *y*.

### Least-Squares Regression Line

Recall again the data from the study that looked at hours studied versus score on test:

For these data, *r* = 0.864, so we have a strong, positive, linear association between the variables. Suppose we wanted to predict the score of a person who studied for 2.75 hours. If we knew we were working with a linear model—a line that seemed to fit the data well—we would feel confident about using the equation of the line to make such a prediction. We are looking for a **line of best fit**. We want to find a **regression line**—a line that can be used for predicting response values from explanatory values. In this situation, we would use the regression line to predict the exam score for a person who studied 2.75 hours.

The line we are looking for is called the least-squares regression line. We could draw a variety of lines on our scatterplot trying to determine which has the best fit. Let be the predicted value of *y* for a given value of *x*. Then *y* – represents the error in prediction. We want our line to minimize errors in prediction, so we might first think that ∑ ( – ) would be a good measure (*y* – is the *actual value* minus the *predicted value*). However, because our line is going to average out the errors in some fashion, we find that ∑ (*y* – ) = 0. To get around this problem, we use ∑ (*y* – )_{2}. This expression will vary with different lines and is sensitive to the fit of the line. That is, ∑ (*y* – )^{2} is small when the linear fit is good and large when it is not.

The **least-squares regression line** (LSRL) is the line that minimizes the sum of squared errors. If = *a* + *bx* is the LSRL, then minimizes ∑ (*y* – )^{2}.

For *n* ordered pairs (*x*, *y*), we calculate: *s _{x}*,

*s*, and

_{y}*r*. Then we have:

Example:For the hours studied (x) versus score (y) study, the LSRL is = 59.03 + 6.77x. We asked earlier what score would we predict for someone who studied 2.75 hours. Plugging this value into the LSRL, we have = (2.75) = 59.03 + 6.77(2.75) = 77.63. It's important to understand that this is thepredictedvalue, not the exact value such a person will necessarily get.

Example:Consider once again the computer printout for the data of the preceding example:

The regression equation is given as "Score = 59 + 6.77 Hours." The *y*-intercept, which is the predicted score when the number of hours studied is zero, and the slope of the regression line are listed in the table under the column "Coef."

Example:We saw earlier that the calculator output for these data was

The values of *a* and *b* are given as part of the output. Remember that these values were obtained by putting the "Hours Studied" data in L1, the "Test Score" data in L2, and doing LinReg (ax+b) L1,L2. When using LinReg (ax+b), the explanatory variable *must* come first and the response variable second.

Example:An experiment is conducted on the effects of having convicted criminals provide restitution to their victims rather than serving time. The following table gives the data for 10 criminals. The monthly salaries (X) and monthly restitution payments (Y) were as follows:

- Find the correlation between
*X*and*Y*and the regression equation that can be used to predict monthly restitution payments from monthly salaries. - Draw a scatterplot of the data and put the LSRL on the graph.
- Interpret the slope of the regression line in the context of the problem.
- How much would a criminal earning $1400 per month be expected to pay in restitution?

solution:Put the monthly salaries (x) in L1 and the monthly restitution payments (y) in L2. Then enter STAT CALC LinReg (a+bx) L1,L2,Y1.

*r*= 0.97, Payments = –56.22 + 0.46 (Salary). (If you answered = 56.22 + 0.46*x*, you must define*x*and*y*so that the regression equation can be understood in the context of the problem. An algebra equation, without a contextual definition of the variables, will not receive full credit.)- The slope of the regression line is 0.46. This tells us that, for each $1 increase in the criminal's salary, the amount of restitution is predicted to increase by $0.46. Or you could say that the average increase is $0.46.
*Payment*–56.22 + 0.46 (1400) = $587.78.

Practice problems for these concepts can be found at:

### Ask a Question

Have questions about this article or topic? Ask### Related Questions

#### Q:

#### Q:

#### Q:

#### Q:

### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Signs Your Child Might Have Asperger's Syndrome
- Definitions of Social Studies
- Curriculum Definition
- Theories of Learning
- A Teacher's Guide to Differentiating Instruction
- Child Development Theories
- 8 Things First-Year Students Fear About College