One-Variable Data Analysis Free Response Practice Problems for AP Statistics (page 2)
Review the following concepts if necessary:
- Graphical Analysis for AP Statistics
- Histogram for AP Statistics
- Measures of Center for AP Statistics
- Measures of Spread for AP Statistics
- Position of a Term in a Distribution for AP Statistics
- Normal Distribution for AP Statistics
- Mickey Mantle played with the New York Yankees from 1951 through 1968. He had the following number of home runs for those years: 13, 23, 21, 27, 37, 52, 34, 42, 31, 40, 54, 30, 15, 35, 19, 23, 22, 18. Were any of these years outliers? Explain.
- Which of the following are properties of the normal distribution? Explain your answers.
- It has a mean of 0 and a standard deviation of 1.
- Its mean = median = mode.
- All terms in the distribution lie within four standard deviations of the mean.
- It is bell-shaped.
- The total area under the curve and above the horizontal axis is 1.
- Make a stemplot for the number of home runs hit by Mickey Mantle during his career (from question ?1, the numbers are: 13, 23, 21, 27, 37, 52, 34, 42, 31, 40, 54, 30, 15, 35, 19, 23, 22, 18). Do it first using an increment of 10, then do it again using an increment of 5. What can you see in the second graph that was not obvious in the first?
- A group of 15 students were identified as needing supplemental help in basic arithmetic skills. Two of the students were put through a pilot program and achieved scores of 84 and 89 on a test of basic skills after the program was finished. The other 13 students received scores of 66, 82, 76, 79, 72, 98, 75, 80, 76, 55, 77, 68, and 69. Find the z-scores for the students in the pilot program and comment on the success of the program.
- For the 15 students whose scores were given in question #4, find the five-number summary and construct a boxplot of the data. What are the distinguishing features of the graph?
- Assuming that the batting averages in major league baseball over the years have been approximately normally distributed with a mean of 0.265 and a standard deviation of 0.032, what would be the percentile rank of a player who bats 0.370 (as Barry Bonds did in the 2002 season)?
- In problem #1, we considered the home runs hit by Mickey Mantle during his career. The following is a stemplot of the number of doubles hit by Mantle during his career. What is the interquartile range (IQR) of this data? (Hint: n =18.)
- For the histogram pictured below, what proportion of the terms are less than 3.5?
- The following graph shows boxplots for the number of career home runs for Hank Aaron and Barry Bonds. Comment on the graphs. Which player would you rather have on your team most seasons? A season in which you needed a lot of home runs?
- Suppose that being in the top 20% of people with high blood cholesterol level is considered dangerous. Assume that cholesterol levels are approximately normally distributed with mean 185 and standard deviation 25. What is the maximum cholesterol level you can have and not be in the top 20%?
- The following are the salaries, in millions of dollars, for members of the 2001–2002 Golden State Warriors: 6.2, 5.4, 5.4, 4.9, 4.4, 4.4, 3.4, 3.3, 3.0, 2.4, 2.3, 1.3, .3, .3. Which gives a better "picture" of these salaries, mean-based or median-based statistics? Explain.
- The following table gives the results of an experiment in which the ages of 525 pennies from current change were recorded. "0" represents the current year, "1" represents pennies one year old, etc.
- A wealthy woman is trying to decide whether or not to buy a coin collection that contains 1450 coins. She will buy the collection only if at least 225 of the coins are worth more than $170. The present owner of the collection reports that the average coin in the collection is worth $130 with a standard deviation of $15. Should the woman buy the collection?
- The mean of a set of 150 values is 35, its median is 33, its standard deviation is 6, and its IQR is 12. A new set is created by first subtracting 10 from every term and then multiplying by 5. What are the mean, median, variance, standard deviation, and IQR of the new set?
- The following graph shows the distribution of the heights of 300 women whose average height is 65" and whose standard deviation is 2.5". Assume that the heights of women are approximately normally distributed. How many of the women would you expect to be less than 5'2" tall?
- Which of the following are properties of the standard deviation? Expain your answer.
- It's the square root of the average squared deviation from the mean.
- It's resistant to extreme values.
- It's independent of the number of terms in the distribution.
- If you added 25 to every value in the dataset, the standard deviation wouldn't change.
- The interval ± 2s contains 50% of the data in the distribution.
- Look again at the salaries of the Golden State Warriors in question 11 (in millions, 6.2, 5.4, 5.4, 4.9, 4.4, 4.4, 3.4, 3.3, 3.0, 2.4, 2.3, 1.3, .3, .3). Erick Dampier was the highest paid player at $6.2 million. What sort of raise would he need so that his salary would be an outlier among these salaries?
- Given the histogram below, draw, as best you can, the boxplot for the same data.
- On the first test of the semester, the class average was 72 with a standard deviation of 6. On the second test, the class average was 65 with a standard deviation of 8. Nathan scored 80 on the first test and 76 on the second. Compared to the rest of the class, on which test did Nathan do better?
- What is the mean of a set of data where s = 20, Σ x = 245, and Σ (x – )2 = 13 600?
Describe the distribution of ages of pennies (remember that the instruction "describe" means to discuss center, spread, and shape). Justify your answer.
- Using the calculator, we find that = 29.78, s = 11.94, Q1 = 21, Q3 = 37. Using the 1.5(IQR) rule, outliers are values that are less than 21 - 1.5(37 - 21) = -3 or greater than 37 + 1.5(37 - 21) = 61. Because no values lie outside of those boundaries, there are no outliers by this rule.
- (a) is a property of the standard normal distribution, not a property of normal distributions in general. (b) is a property of the normal distribution. (c) is not a property of the normal distribution–Almost all of the terms are within four standard deviations of the mean but, at least in theory, there are terms at any given distance from the mean. (d) is a property of the normal distribution—the normal curve is the perfect bell-shaped curve. (e) is a property of the normal distribution and is the property that makes this curve useful as a probability density curve.
- = 76.4 and s = 10.17.
- Area of the left of 3.28 is 0.9995.
- There are 18 values in the stemplot. The median is 17 (actually between the last two 7s in the row marked by the (5) in the count column of the plot —it's still 17). Because there are 9 values in each half of the stemplot, the median of the lower half of the data, Q1, is the 5th score from the top. So, Q1 = 14. Q3 = the 5th score counting from the bottom = 24. Thus, IQR = 24 - 14 = 10.
- There are 3 values in the first bar, 6 in the second, 2 in the third, 9 in the fourth, and 5 in the fifth for a total of 25 values in the dataset. Of these, 3 + 6 + 2 = 11 are less than 3.5. There are 25 terms altogether, so the proportion of terms less than 3.5 is 11/25 = 0.44.
- With the exception of the one outlier for Bonds, the most obvious thing about these two is just how similar the two are. The medians of the two are almost identical and the IQRs are very similar. The data do not show it, but with the exception of 2001, the year Bonds hit 73 home runs, neither batter ever hit 50 or more home runs in a season. So, for any given season, you should be overjoyed to have either on your team, but there is no good reason to choose one over the other. However, if you based your decision on who had the most home runs in a single season, you would certainly choose Bonds.
- Let x be the value in question. Because we do not want to be in the top 20%, the area to the left of x is 0.8. Hence zx = 0.84 (found by locating the nearest table entry to 0.8, which is 0.7995 and reading the corresponding z-score as 0.84). Then
- = $3.36 million, s = $1.88 million, Med = $3.35 million, IQR = $2.6 million. A boxplot of the data looks like this:
- The easiest way to do this is to use the calculator. Put the age data in L1 and the frequencies in L2. Then do 1-Var Stats L1,L2 (the calculator will read the second list as frequencies for the first list).
- The mean is 2.48 years, and the median is 2 years. This indicates that the mean is being pulled to the right—and that the distribution is skewed to the right or has outliers in the direction of the larger values.
- The standard deviation is 2.61 years. Because one standard deviation to left would yield a negative value, this also indicates that the distribution extends further to the right than the left.
- A histogram of the data, drawn on the TI–83/84, is drawn below. This definitely indicates that the ages of these pennies is skewed to the right.
- Since we don't know the shape of the distribution of coin values, we must use Chebyshev's rule to help us solve this problem. Let k = the number of standard deviations that 170 is above the mean. Then 130 + k · (15) = 170. So, k ≈ 2.67. Thus, at most , or 14%, of the coins are valued at more than $170. Her requirement was that or 15.5%, of the coins must be valued at more than $170. Since at most 14% can be valued that highly, she should not buy the collection.
- The new mean is 5(35 - 10) = 125.
- First we need to find the proportion of women who would be less than 62'' tall:
- a, c, and d are properties of the standard deviation. (a) serves as a definition of the standarddeviation. It is independent of the number of terms in the distribution in the sense that simply adding more terms will not necessarily increase or decrease s. (d) is another way of saying that the standard deviation is independent of the mean—it's a measure of spread, not a measure of center.
- For these data, Q1 = $2.3 million, Q3 = $4.9 million. To be an outlier, Erick would need to make at least 4.9 + 1.5(4.9 - 2.3) = 8.8 million. In other words, he would need a $2.6 million dollar raise in order to have his salary be an outlier.
- You need to estimate the median and the quartiles. Note that the histogram is skewed to the left, so that the scores tend to pack to the right. This means that the median is to the right of center and that the boxplot would have a long whisker to the left. The boxplot looks like this:
- If you standardize both scores, you can compare them on the same scale. Accordingly,
Using the ± 2s rule, we have ± 2s = 29.78 ± 2(11.94) = (5.9, 53.66). By this standard, the year he hit 54 home runs would be considered an outlier.
What shows up when done by 5 rather than 10 is the gap between 42 and 52. In 16 out of 18 years, Mantle hit 42 or fewer home runs. He hit more than 50 only twice.
Using the Standard Normal Probability table, a score of 84 corresponds to the 77.34th percentile, and a score of 89 corresponds to the 89.25th percentile. Both students were in the top quartile of scores after the program and performed better than all but one of the other students. We don't know that there is a cause-and-effect relationship between the pilot program and the high scores (that would require comparisons with a pretest), but it's reasonable to assume that the program had a positive impact. You might wonder how the student who got the 98 did so well!
The most distinguishing feature is that the range (43) is quite large compared to themiddle 50% of the scores (13). That is, we can see from the graph that the scores are packed somewhat closely about the median. The shape of a histogram of the data would be symmetric and mound shaped.
[Using the calculator, the solution to this problem is given by invNorm (0.8,185,25).]
The fact that the mean and median are virtually the same, and that the boxplot shows that the data are more or less symmetric, indicates that either set of measures would be appropriate.
The new median is 5(33 - 10) = 115.
The new variance is 52(62) = 900.
The new standard deviation is 5(6) = 30.
The new IQR is 5(12) = 60.
So 0.1151 of the terms in the distribution would be less than 62''. This means that 0.1151(300) = 34.53, so you would expect that 34 or 35 of the women would be less than 62'' tall.
The standard deviation is not resistant to extreme values (b) because it is based on the mean, not the median. (e) is a statement about the interquartile range. In general, unless we know something about the curve, we don't know what proportion of terms are within 2 standard deviations of the mean.
Nathan did slightly, but only slightly, better on the second test.
- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Signs Your Child Might Have Asperger's Syndrome
- Theories of Learning
- A Teacher's Guide to Differentiating Instruction
- Child Development Theories
- Social Cognitive Theory
- Curriculum Definition
- Teaching Your Kids About Ramadan