Two-Variable Data Analysis Free Response Practice Problems for AP Statistics (page 2)

based on 1 rating
By — McGraw-Hill Professional
Updated on Feb 5, 2011


  1. Thus, = –11.9 + 2.2x.
    1. There seems to be a moderate positive relationship between the scores: students who did better on the first test tend to do better on the second, but the relationship isn't very strong; r = 0.55.
  3. A line is not a good model for the data because the residual plot shows a definite pattern: the first 8 points have negative residuals and the last 8 points have positive residuals. The box is in a cluster of points with positive residuals. We know that, for any given point, the residual equals actual value minus predicted value. Because actual – predicted > 0, we have actual > predicted, so that the regression equation is likely to underestimate the actual value.
  4. The regression equation for predicting time from year is time = 79.21 – 0.61(year). We need timeyear), we get year = 31.3. So, we would predict that times will drop under one minute in about 31 or 32 years. The problem with this is that we are extrapolating far beyond the data. Extrapolation is dangerous in any circumstance, and especially so 24 years beyond the last known time. It's likely that the rate of improvement will decrease over time.
  5. A scatterplot of the data (graph on the left) appears to be exponential. Taking the natural logarithm of each y-value, the scatterplot (graph on the right) appears to be more linear. Taking the natural logarithm of each y-value and finding the LSRL, we have ln(#Roaches) = 0.914 + 0.108 (Days) = 0.914 + 0.108(9) = 1.89. Then #Roaches = e1.89 = 6.62.
  6. Two-Variable Data Analysis Free Response

  7. The correlation between walking more and better health may or may not be causal. It may be that people who are healthier walk more. It may be that some other variable, such as general health consciousness, results in walking more and in better health. There may be a causal association, but in general, correlation is not causation.
  8. Carla has reported the value of r2, the coefficient of determination. If she had predicted each girl's grade based on the average grade only, there would have been a large amount of variability. But, by considering the regression of grades on socioeconomic status, she has reduced the total amount of variability by 72%. Because r2 = 0.72, r = 0.85, which is indicative of a strong positive linear relationship between grades and socioeconomic status. Carla has reason to be happy.
    1. is false. ∑(yTwo-Variable Data Analysis Free Response) = 0 for the LSRL, but there is no unique line for which this is true.
    2. is true.
    3. is true. In fact, this is the definition of the LSRL—it is the line that minimizes the sum of the squared residuals.
    4. is true since b = rTwo-Variable Data Analysis Free Response and Two-Variable Data Analysis Free Response is constant.
    5. is false. The slope of the regression lines tell you by how much the response variable changes on average for each unit change in the explanatory variable.
  10. = 26.211 – 0.25x = 26.211 – 0.25(73) = 7.961. The residual for x = 73 is the actual value at 73 minus the predicted value at 73, or y= 7.9 – 7.961 = –0.061. (73, 7.9) is below the LSRL since y< 0 y < .
    1. r = +0.75; the slope is positive and is the opposite of the original slope.
    2. r = –0.75. It doesn't matter which variable is called x and which is called y.
    3. r = –0.75; the slope is the same as the original slope.
  12. We know that b = r, so that 2.7 = r (3.33)→ r = = 0.81 → r2 = 0.66. The proportion of the variability that is not explained by the regression of y on x is 1 – r2 = 1 – 0.66 = 0.34.
  13. Because the linear pattern will be stronger, the correlation coefficient will increase. The influential point pulls up on the regression line so that its removal would cause the slope of the regression line to decrease.
    1. rate = –0.3980 + 0.1183 (number).
    2. r = =0.987 (r is positive since the slope is positive).
    3. rate = –0.3980 + 0.1183(20) = 1.97 crimes per thousand employees. Be sure to use 20, not 200.
    1. Percentage appreciation = 1.897 + 0.115 (number)
    2. Percentage appreciation = 1.897 + 0.115(85) = 11.67%.
    3. r = 0.82, which indicates a strong linear relationship between the number of new homes built and percent appreciation.
    4. If the number of new homes built was unknown, your best estimate would be the average percentage appreciation for the 5 years. In this case, the average percentage appreciation is 11.3%. [For what it's worth, the average error (absolute value) using the mean to estimate appreciation is 2.3; for the regression line, it's 1.3.]
    1. If r2 = 0.81, then r = ± .9. The slope of the regression line for the standardized data is either 0.9 or –0.9.
    2. If r = 0.9, the scatterplot shows a strong positive linear pattern between the variables. Values above the mean on one variable tend to be above the mean on the other, and values below the mean on one variable tend to be below the mean on the other. If r = –0.9, there is a strong negative linear pattern to the data. Values above the mean on one variable are associated with values below the mean on the other.
    1. r = 0.8
    2. r = 0.0
    3. r = –1.0
    4. r = –0.5
  18. Each of the points lies on the regression line → every residual is 0 → the sum of the squared residuals is 0.
    1. r = 0.90 for these data, indicating that there is a strong positive linear relationship between student averages and evaluations of Prof. Socrates. Furthermore, r2 = 0.82, which means that most of the variability in student evaluations can be explained by the regression of student evaluations on student average.
    2. If y is the evaluation score of Prof. Socrates and x is th e corresponding average for the student who gave the evaluation, then = –29.3 + 1.34 x. If x = 80, then = –29.3 + 1.34(80) = 77.9, or 78.
    1. True, because b = r and is positive.
    2. True. r is the same if explanatory and response variables are reversed. This is not true, however, for the slope of the regression line.
    3. False. Because r is defined in terms of the means of the x and y variables, it is not resistant.
    4. False. r does not depend on the units of measurement.
    5. True. The definition of r, , necessitates that the variables be numerical, not categorical.
    1. Left-hand strength = 7.1 + 0.35(12) = 11.3 kg.
    2. Intercept: The predicted left-hand strength of a person who has zero right-hand strength is 7.1 kg.
    3. Slope: On average, left-hand strength increases by 0.35 kg for each 1 kg increase in right-hand strength. Or left-hand strength is predicted to increase by 0.35 kg for each 1 kg increase in right-hand strength.


View Full Article
Add your own comment

Ask a Question

Have questions about this article or topic? Ask
150 Characters allowed