Practice problems for these concepts can be found at:
- Two-Variable Data Analysis Practice Problems for AP Statistics
- Two-Variable Data Analysis Cumulative Review Problems for AP Statistics
- Two-Variable Data Analysis Rapid Review for AP Statistics
In the absence of a better way to predict y-values from x-values, our best guess for any given x might well be , the mean value of y.
Example: Suppose you had access to the heights and weights of each of the students in your statistics class. You compute the average weight of all the students. You write the heights of each student on a slip of paper, put the slips in a hat, and then draw out one slip. You are asked to predict the weight of the student whose height is on the slip of paper you have drawn. What is your best guess as to the weight of the student?
Solution: In the absence of any known relationship between height and weight, your best guess would have to be the average weight of all the students. You know the weights vary about the average and that is about the best you could do.
If we guessed at the weight of each student using the average, we would be wrong most of the time. If we took each of those errors and squared them, we would have what is called the sum of squares total (SST). It's the total squared error of our guesses when our best guess is simply the mean of the weights of all students, and represents the total variability of y.
Now suppose we have a least-squares regression line that we want to use as a model for predicting weight from height. It is, of course, the LSRL we discussed in detail earlier in this chapter, and our hope is that there will be less error in prediction than by using . Now, we still have errors from the regression line (called residuals, remember?). We call the sum of those errors the sum of squared errors (SSE). So, SST represents the total error from using
as the basis for predicting weight from height, and SSE represents the total error from using the LSRL. SST–SSE represents the benefit of using the regression line rather than
for prediction. That is, by using the LSRL rather than
, we have explained a certain proportion of the total variability by regression.
The proportion of the total variability in y that is explained by the regression of y on x is called the coefficient of determination, The coefficient of determination is symbolized by r2. Based on the above discussion, we note that
It can be shown algebraically, although it isn't easy to do so, that this r2 is actually the square of the familiar r, the correlation coefficient. Many computer programs will report the value of r2 only (usually as "R-sq"), which means that we must take the square root of r2 if we only want to know r (remember that r and b, the slope of the regression line, are either both positive or negative so that you can check the sign of b to determine the sign of r if all you are given is r2). The TI-83/84 calculator will report both r and r2, as well as the regression coefficient, when you do LinReg(a+bx).
-
1
- 2
Ask a Question
Have questions about this article or topic? AskRelated Questions
See More QuestionsToday on Education.com
WORKBOOKS
May Workbooks are Here!
ACTIVITIES
Get Outside! 10 Playful Activities
Local SAT & ACT Classes
Popular Articles
- Kindergarten Sight Words List
- The Five Warning Signs of Asperger's Syndrome
- What Makes a School Effective?
- Child Development Theories
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- 10 Fun Activities for Children with Autism
- Bullying in Schools
- Test Problems: Seven Reasons Why Standardized Tests Are Not Working
- Should Your Child Be Held Back a Grade? Know Your Rights
- First Grade Sight Words List

Celebrate Memorial Day! Worksheets and Activities About American History
7 Parenting Tips to Take the Pressure Off 
Add your own comment