Practice problems for these concepts can be found at:
 TwoVariable Data Analysis Practice Problems for AP Statistics
 TwoVariable Data Analysis Cumulative Review Problems for AP Statistics
 TwoVariable Data Analysis Rapid Review for AP Statistics
In the absence of a better way to predict yvalues from xvalues, our best guess for any given x might well be , the mean value of y.
Example: Suppose you had access to the heights and weights of each of the students in your statistics class. You compute the average weight of all the students. You write the heights of each student on a slip of paper, put the slips in a hat, and then draw out one slip. You are asked to predict the weight of the student whose height is on the slip of paper you have drawn. What is your best guess as to the weight of the student?
Solution: In the absence of any known relationship between height and weight, your best guess would have to be the average weight of all the students. You know the weights vary about the average and that is about the best you could do.
If we guessed at the weight of each student using the average, we would be wrong most of the time. If we took each of those errors and squared them, we would have what is called the sum of squares total (SST). It's the total squared error of our guesses when our best guess is simply the mean of the weights of all students, and represents the total variability of y.
Now suppose we have a leastsquares regression line that we want to use as a model for predicting weight from height. It is, of course, the LSRL we discussed in detail earlier in this chapter, and our hope is that there will be less error in prediction than by using . Now, we still have errors from the regression line (called residuals, remember?). We call the sum of those errors the sum of squared errors (SSE). So, SST represents the total error from using as the basis for predicting weight from height, and SSE represents the total error from using the LSRL. SST–SSE represents the benefit of using the regression line rather than for prediction. That is, by using the LSRL rather than , we have explained a certain proportion of the total variability by regression.
The proportion of the total variability in y that is explained by the regression of y on x is called the coefficient of determination, The coefficient of determination is symbolized by r^{2}. Based on the above discussion, we note that
It can be shown algebraically, although it isn't easy to do so, that this r^{2} is actually the square of the familiar r, the correlation coefficient. Many computer programs will report the value of r^{2} only (usually as "Rsq"), which means that we must take the square root of r^{2} if we only want to know r (remember that r and b, the slope of the regression line, are either both positive or negative so that you can check the sign of b to determine the sign of r if all you are given is r^{2}). The TI83/84 calculator will report both r and r^{2}, as well as the regression coefficient, when you do LinReg(a+bx).

1
 2
Ask a Question
Have questions about this article or topic? AskRelated Questions
Q:
Q:
Q:
Q:
Popular Articles
 Kindergarten Sight Words List
 First Grade Sight Words List
 10 Fun Activities for Children with Autism
 Definitions of Social Studies
 Grammar Lesson: Complete and Simple Predicates
 Child Development Theories
 Signs Your Child Might Have Asperger's Syndrome
 How to Practice Preschool Letter and Name Writing
 Social Cognitive Theory
 Theories of Learning