Coefficient of Determination for AP Statistics
Practice problems for these concepts can be found at:
- Two-Variable Data Analysis Practice Problems for AP Statistics
- Two-Variable Data Analysis Cumulative Review Problems for AP Statistics
- Two-Variable Data Analysis Rapid Review for AP Statistics
In the absence of a better way to predict y-values from x-values, our best guess for any given x might well be , the mean value of y.
Example: Suppose you had access to the heights and weights of each of the students in your statistics class. You compute the average weight of all the students. You write the heights of each student on a slip of paper, put the slips in a hat, and then draw out one slip. You are asked to predict the weight of the student whose height is on the slip of paper you have drawn. What is your best guess as to the weight of the student?
Solution: In the absence of any known relationship between height and weight, your best guess would have to be the average weight of all the students. You know the weights vary about the average and that is about the best you could do.
If we guessed at the weight of each student using the average, we would be wrong most of the time. If we took each of those errors and squared them, we would have what is called the sum of squares total (SST). It's the total squared error of our guesses when our best guess is simply the mean of the weights of all students, and represents the total variability of y.
Now suppose we have a least-squares regression line that we want to use as a model for predicting weight from height. It is, of course, the LSRL we discussed in detail earlier in this chapter, and our hope is that there will be less error in prediction than by using . Now, we still have errors from the regression line (called residuals, remember?). We call the sum of those errors the sum of squared errors (SSE). So, SST represents the total error from using as the basis for predicting weight from height, and SSE represents the total error from using the LSRL. SST–SSE represents the benefit of using the regression line rather than for prediction. That is, by using the LSRL rather than , we have explained a certain proportion of the total variability by regression.
The proportion of the total variability in y that is explained by the regression of y on x is called the coefficient of determination, The coefficient of determination is symbolized by r2. Based on the above discussion, we note that
It can be shown algebraically, although it isn't easy to do so, that this r2 is actually the square of the familiar r, the correlation coefficient. Many computer programs will report the value of r2 only (usually as "R-sq"), which means that we must take the square root of r2 if we only want to know r (remember that r and b, the slope of the regression line, are either both positive or negative so that you can check the sign of b to determine the sign of r if all you are given is r2). The TI-83/84 calculator will report both r and r2, as well as the regression coefficient, when you do LinReg(a+bx).
Today on Education.com
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Child Development Theories
- Signs Your Child Might Have Asperger's Syndrome
- 10 Fun Activities for Children with Autism
- Social Cognitive Theory
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- GED Math Practice Test 1
- The Homework Debate
- Problems With Standardized Testing