### SECTION II

Time: 1 hour and 30 minutes

Number of problems: 6

Percentage of total grade: 50

### General Instructions

There are two parts to this section of the examination. Part A consists of five equally weighted problems that represent 75% of the total weight of this section. Spend about 65 minutes on this part of the exam. Part B consists of one longer problem that represents 25% of the total weight of this section. Spend about 25 minutes on this part of the exam. You are not necessarily expected to complete all parts of every question. Statistical tables and formulas are provided.

- Be sure to write clearly and legibly. If you make an error, you may save time by crossing it out rather than trying to erase it. Erased or crossed-out work will not be graded.
- Show all your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as the accuracy of your final answers. Correct answers without support work may not receive credit.

### Statistics, Section II, Part A, Questions 1–5

Spend about 65 minutes on this part of the exam; percentage of Section II grade: 75.

**Directions:** Show all your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as on the accuracy of your results and explanation.

- David was comparing the number of vocabulary words children know about transportation at various ages. He fit a least-squares regression line to the data. The residual plot and part of the computer output for the regression are given below.
- Is a line an appropriate model for these data? Explain.
- What is the equation of the least-square regression line for predicting the number of words from age?
- What is the predicted number of words for a child of 7.5 years of age?
- Interpret the slope of the regression line in the context of the problem.
- Would it be appropriate to use the model to predict the number of words a 12-yearold would know?
- Students at Dot.Com Tech are allowed to sign up for one math class each semester. The numbers in each grade level signing up for various classes for next semester are given in the following table.
- What is the probability that a student will take calculus?
- What is the probability that a 12th grader will take either analysis or calculus?
- What is the probability that a person taking algebra II is a 10th grader?
- Consider the events, "A student takes geometry" and "A student is a 10th grader."
- The state in which you reside is undergoing a significant budget crisis that will affect education. Your school is trying to decide how many sections of upper-level mathematics classes to offer next year. It is very expensive to offer sections that aren't full, so the school doesn't want to offer any more sections than it absolutely needs to. The assistant principal in charge of scheduling selects a random sample of 60 current sophomores and juniors. Fifty-five of them return the survey, and 48 indicate that they intend to take math during the coming year. If 80% or more of the students actually sign up for math, the school will need to add a section.
- On the basis of the survey data, would you recommend to the assistant principal that an additional class of upper division mathematics should be scheduled? Give appropriate statistical evidence to support your recommendation.
- Five of the 60 who received surveys failed to return them. If they had returned them, how might it have affected the assistant principal's decision? Explain.

- It is known that the symptoms of adult depression can be treated effectively with either therapy, antidepressants, or a combination of the two. A pharmaceutical company wants to test a new antidepressant against an older medication that has been on the market for several years. One hundred fifty volunteers who have been diagnosed with depression, and who have not been taking any medication for it, are available for the study. This group contains 72 men and 78 women. Sixty of the volunteers have been in therapy for their depression for at least 3 months.
- Design a completely randomized experiment to test the new medication. Include a brief explanation of the randomization process.
- Could the experiment you designed in part (a) be improved by blocking? If so, design an improved study that involves blocking. If not, explain why not.

- The 1970 draft lottery was suspected to be biased toward birthdays later in the year. Because there are 366 possible birthdays, in a fair drawing we would expect to find, each month, an equal number of selections less than or equal to 183 and greater than or equal to 184. The following table shows the data from the 1970 draft lottery.

Are these events independent? Justify your answer.

Do these data give evidence that the 1970 draft lottery was not fair? Give appropriate statistical evidence to support your conclusion.

### Statistics, Section II, Part B, Question 6

Spend about 25 minutes on this part of the exam; percentage of Section II grade: 25.

**Directions:** Show all of your work. Indicate clearly the methods you use because you will be graded on the correctness of your methods as well as on the accuracy of your results and explanation.

- A lake in the Midwest has a restriction on the size of trout caught in the lake. The average length of trout over the years has been 11 inches with a standard deviation of 0.6 inches. The lengths are approximately normally distributed. Because of overfishing during the past few years, any fish under 11.5 inches in length must be released.
- What is the probability that a fisherman will get lucky and catches a fish she can keep? Round your answer to the nearest tenth.
- Design a simulation to determine the probability how many fish the fisherman must catch, on average, in order to be able to take home five trout for dinner.
- Use the table of random digits between parts (d) and (e) of this problem to conduct five trials of your simulation. Show your work directly on the table.
- Based on your simulation, what is your estimate of the average number of fish that need to be caught in order to catch five she can keep?
- What is the theoretical expected number of fish that need to be caught in order to be able to keep five of them?

### Solutions to Practice Exam 1, Section I

- The correct answer is (b). The confidence level isn't mentioned in the problem, but polls often use 95%. If that is the case, we are 95% confident that the true value is within 3.5% of the sample value.
- The correct answer is (b). Since the mean is noticeably greater than the median, the distribution is likely skewed to the right. Another indication of this is the long "whisker" on a boxplot of the five-number summary. IQR = 544 – 502 = 42. An outlier is any value less than 502 – 1.5(542) = 439 or greater than 544 + 1.5(42) = 607. Since the maximum value given (610) is greater than 607, there is at least one outlier.
- The correct answer is (a). The most likely bias would be to influence people to oppose such a law since many voters are resistant to constitutional amendments restricting rights. Compare this to the question, "Do you favor a law that would provide that only marriage between a man and a woman is valid or recognized in California?" which could influence voters to favor the amendment. This was, in fact, a legal issue in California prior to the election. The original title of the amendment was "Limit on Marriage." The Attorney General of California, Jerry Brown, changed the title to include the phrase "eliminates (the) right of same-sex couples to marry." When challenged in court by proponents of the amendment, the title change was upheld based on the fact that the right of same-sex couples to marry had been given full legal status by the state supreme court earlier in the year.
- The correct answer is (d). Choosing your sample from only the homes in your population of interest gives you a larger sample on which to base your confidence interval. If you use Plan A, you will end up with many homes painted with different paint than the paint of interest.
- The correct answer is (e). The
*t*-distributions are symmetric about their means. I is the mirror image of*P*(*X*> 65). II, in a continuous distribution, is equivalent to*P*(*X*> 65)—this would not be true in a discrete distribution (e.g., a binomial). - The correct answer is (c). (a) describes a simple random sample. (b) describes a cluster sample. (d) describes stratified random sampling. (e) describes a voluntary response sample.
- The correct answer is (a). There is a natural tendency on the part of a subject in an experiment to want to please the researcher. It is likely that the employees were increasing their production because they wanted to behave in the way they thought they were expected to.
- The correct answer is (c). A Type-I error occurs when a true hypothesis is incorrectly rejected. In this case, that means that the assumption of innocence is rejected, and he is found guilty.
- The correct answer is (b).
*P*(A and B) =*P*(A) ·*P*(A|B) = (0.4)(0.2) = 0.08.*P*(A or B) =*P*(A) +*P*(B) –*P*(A and B) = 0.4 + 0.3 – 0.08 = 0.62. - The correct answer is (e). The purpose of randomization is to control for the unknown effects of variables that might affect the response, in this case the differential effects of North or South placement. (a) is incorrect since studies of any size benefit from randomization. (b) is simply nonsense—the number of treatments does not affect the need to randomize. (c) involves blocking and would be correct if we know in advance that there were differential effects based on a North/South placement—but nothing in the problem indicates this. (d) is not incorrect, but it's not the reason we are randomizing in this situation.
- The correct answer is (e). By definition a "cluster sample" occurs when a population is divided into groups and then a group or groups is randomly selected. (a) is a
__simple random sample;__(b) is a systematic__sample;__(c) is a self-selected sample and is not random; (d) is a__simple random sample__. - The correct answer is (e). Be clear on the difference between the treatment (type of cat food in this problem) and the blocking variable (breed of cat).
- The correct answer is (b). The box is longer for Network A and the ends of the whiskers are further apart than Network B Network A has a greater range of ratings than Network B. The 3rd quartile, the median, and the 1st quartile of Network A are higher than Network B, which can be interpreted to mean that Network A is higher rated than Network B. I is not correct because there is no way to tell how many values are in a boxplot.between the treatment (type of cat food in this problem) and the blocking variable (breed of cat).
- The correct answer is (e). If a significance test rejects
*H*_{0}at α = 0.05, then a two-sided 95% confidence interval*will*not contain 0.3. If the finding is not significant at α = 0.04, then a two-sided 96% confidence interval*will*contain 0.3. Hence, any confidence level 95% or less will contain 0.3 and any confidence level 96% or higher will not. - The correct answer is (a). Since 0 is in the interval, it is possible that the true difference between the proportions is 0, i.e, that there has not been a significant drop in support. Note that a 90% interval, but not a 95% interval, would contain only positive numbers and would provide statistically significant evidence of a drop in support. (d) is false since it's a one-sided test (with a
*P*-value of 0.046) and the confidence interval is two sided. A two-sided test (*H*_{A}:*P*_{1}≠*p*_{2}) would yield the same conclusion (*P*= 0.092) as the confidence interval. (b), (c), and (d) are simply wrong. - The correct answer is (a). The expected values are: 0.5 × 50 = 25 Golden Retrievers, 0.8 × 50 = 20 Shepherds; and 0.1 × 50 = 5 Others.
- The answer is (e). We are 95% confident that the true proportion who will vote for the former actor is in the interval (0.35, 0.42). This means that the true proportion is likely to be in this interval.
- The correct answer is (d). While the central limit theorem argues that the shape of Distribution II will be approximately normal, the sample size for Distribution I is too small for the CLT to apply. The best we can say is that the distribution of a sampling distribution for a small sample will be similar to the original population (hence, (c) is true). Since we are not given the shape of the original population, we cannot make the claim that the distribution for the smaller sample size will be approximately normal.
- The correct answer is (a). The
*P*-value for the two-proportion*z*-test is 0.055. This isn't much over 0.05, but it is enough to say that we do not have, at the 0.05 level, a statistically significant difference between the two findings. Note that (a) and (c) are mutually exclusive. If (a) is correct, which it is, then (c) must be false. (b) is not correct since it does not take into account random variation. While the conclusion in (d) is correct, the statement is not—the raw difference between two values is not what allows us to make conclusions about statistical differences between groups. (e), while it does get at the variability between the sample proportions, doesn't tell us anything by itself. - The correct answer is (b). The following tree diagram illustrates the situation:
- The correct answer is (e).
*P*(doesn't take English | does take art)- .

- The correct answer is (d). In general, when testing for a population mean, you should use a
*t*-distribution unless the population standard deviation is known—which it rarely is in practice. (a), (b), and (c) are simply incorrect. (e) is a correct statement but is not the reason you would use*t*rather than*z*(in fact, if it argues anything, it argues that there is no practical numerical difference between using t or z for large samples). - The correct answer is (a). df =
*n*– 2 = 15 – 2 = 13*t** = 3.012 (if you have a TI-84 with the invT function, invT (0.995, 13) = 3.0123). The standard error of the slope of the regression line (found under "St Dev" after "Age"), sb, is 0.07015. The confidence interval, therefore, is 0.00935 ± 3.012(0.07015). (Newer TI-84's have a LinRegTInt in the STAT TESTS menu. However, that is of no help here since the calculator requires that the data be in lists—there is no option to enter summary statistics.) - The correct answer is (d). To be a simple random sample, every possible sample of size 40 must be equally likely. Only (d) meets this standard. Note that (c) and (e) are perfectly valid ways of collecting a random sample. At the start, each member of the population has an equal chance to be in the sample. But they are not SRS's.
- The correct answer is (a). First find the median. Since there are 10 terms, the median is the mean of the middle two terms: . The first quartile is the median of the five terms less than the median (
*x*) and the third quartile is the median of the five terms greater than the median (28). The minimum value is 3 and the maximum is*z*. - The correct answer is (b). Note that the variability of the residuals increases as years of experience increases so that the pattern for
__all__years is not truly random. (a) is correct based on the*t*-test for the slope of the regression line given as "<.0001." (c) is correct as the residual of that point is very large compared to most of the points. (d) is correct and is the reason that (b) is not a true statement. (e) is the standard interpretation of the slope of a regression line. - The correct answer is (b). A confidence interval can be used in place of a significance test in a hypothesis test (for a population mean or the difference between two population means) with a two-sided alternative. In this case, the evidence supports the alternative.
- The correct answer is (b). Since the distributions are independent and approximately normal, we use μ
_{X – Y}= μ_{X – Y}and . Tom (*T*) has N(0.265, 0.035) and Larry (*L*) has N(0.283, 0.029). Hence, μ_{T – L}= 0.265–0.283 = –0.018 and .*T*–*L*then has the distribution N(–0.018, 0.045). We need to know the probability that*T*–*L*is positive (since we require that T be greater than L) in this distribution.*P*(*T*–*L*> 0) = . On the TI-83/84 calculator, normal cdf(0.4, 0.045)= 0.3446. Also, normal cdf (0,100,0.018,0.045)=0.3446. - The correct answer is (b). The tendency in voluntary response surveys is for people who feel most strongly about an issue to respond. If people are happy in their marriage, they are less likely to respond.
- The correct answer is (e). The upper critical
*z*for a 98% confidence interval is*z** = 2.33 (from Table A; on the TI-83/84, invNorm (0.99)= 2.326). In the expression , we choose*P** = 0.5 since we are not given any reason to choose something else. In this case the "recipe" becomes . The sample size needed is = 1508.03. Choose*n*= 1509. (Note: If you use*z** = 2.326 rather than 2.33, you will get*n*≥ 1502.9 choose n = 1503.) - The correct answer is (a). Power can be increased by increasing
*n*, increasing α moving the alternative further away from the null, reducing the variability. This choice provides the best combination of large n and large α. - The correct answer is (d). The data are paired in that two measurements are being taken at each of 12 different stores. The correct analysis would involve a one-sample
*t*-test ( for each of the 12 pairings). - The correct answer is (d). The correlation coefficient is not affected by any linear transformation of the variables. Changing the units of measurement is a linear transformation.
- The correct answer is (b). The key is to note that there is an outlier, which eliminates (a) and (e), and only one outlier, which eliminates (c). The histogram is skewed to the left, which shows in the boxplot for (b) but not for (d) which is, except for the outlier, more symmetric.
- The correct answer is (a). The mean is pulled in the direction of skewness.
- The correct answer is (c). The data are paired, which means that we are testing
*H*_{0}: μ_{d}= 30 vs.*H*_{A}: μ >30, where*d*= Before – After for each student. That is, there is only one sample. - The correct answer is (e).
- The correct answer is (d). The computations are shown in the following table:
- The correct answer is (d). While all of the graphs tend to center around
*X*, the true value of the parameter, (d) has the least variability about*X*. Since all of the graphs have roughly the same low bias, the best estimator will be the one with the least variability. - The correct answer is (d). We are told that the distribution of weight losses is approximately normal, but we are not given the population standard deviation. Hence the most appropriate test is a
*t*-test. Now, , df = 50 – 1 = 49 0.005 <*P*-value < 0.01 (from Table B, rounding down to df = 40; using the TI-83/84, we have tcdf(–100,–2.53, 49)= 0.0073). Note that the*P*-value for the*z*-test in (a) is quite close. However, a*z*-test is not the*most*appropriate test since we do not know the population standard deviation.

.

.

Now, *P* (student takes art) = 0.45 + 0.08 = 0.53.

.

### Ask a Question

Have questions about this article or topic? Ask### Related Questions

#### Q:

#### Q:

#### Q:

#### Q:

### Popular Articles

- Kindergarten Sight Words List
- First Grade Sight Words List
- 10 Fun Activities for Children with Autism
- Definitions of Social Studies
- Signs Your Child Might Have Asperger's Syndrome
- Curriculum Definition
- Theories of Learning
- Child Development Theories
- A Teacher's Guide to Differentiating Instruction
- 8 Things First-Year Students Fear About College