The Matched-Pairs Design for Comparing Two Treatment Means Study Guide (page 2)
Introduction to The Matched-Pairs Design for Comparing Two Treatment Means Study Guide
Thus far, we have only attempted to set confidence intervals on proportions or means based on a sample from a single treatment or population. Now we want to conduct studies that will allow us to compare the means of two treatments. First, we will think about how best to design a study. In this lesson, after introducing the basic ideas behind matched pairs and two-group designs, we will focus on the analysis of data from the paired design. In the next lesson, we will consider the two-group design.
Two-Group versus Matched-Pairs Design
Suppose we are going to conduct a study to compare two methods of production, a standard method and a new method, that cause children's dress shoes to shine. Fifty children have been randomly selected to participate in the study. Each child will be given a new pair of dress shoes that shine. But first we need to decide how to assign the treatments (or production methods) to the children's shoes. One approach is to randomly select 25 (half) of the children and give them shoes made using the standard production process; the other half will receive shoes that were made using the new production process. Thus, each child would have a pair of shoes made by one of the two processes. A second approach is to have one shoe of each pair made with the standard process and the other shoe with the new process. Whether the right or left shoe is made with the first process would be randomly determined. In this second approach, each child would wear a dress shoe made using each process.
Regardless of which approach of assigning treatments is used, the children will wear the shoes whenever they wear dress shoes for six months. At the end of the six months, an evaluator who does not know which shoe received which treatment will score the shine quality of each shoe.
Which method of assigning treatments is better? In this case, having each child wear shoes made by both processes is better. Children differ in their activities while wearing dress shoes. Some may wear them only for special occasions, and their shoes will continue to shine no matter what process was used. Other children run and play in their dress shoes. Their shoes are less likely to continue to shine so the process could make a big difference. By having each child wear a shoe made from each process, both processes are subjected to the same environment (level of play). The difference in shine after six months is due more to the differences in the processes and not to differences in the children. This is an example of a paired experiment.
The other design in which half of the children wore dress shoes made by the standard process and half by the new process is a two-group design. Although this is a reasonable design, it is not the best for this study. The differences we observe in the shine of the shoes after six months are not due only to differences in processes, but also due to differences in children. This would lead to more variation in the estimated mean differences, making it more difficult to determine which, if either, shine process is better.
In the planning stages of a study, it is always important to consider the best way to randomize treatments to the study units. Pairs should be formed if, by pairing, we can eliminate some of the variability in the response that would otherwise be present. In the blinking study presented first in Lesson 4, for each study participant, the number of blinks in a two-minute time period was measured during normal conversation and while playing a video game. Those participants who tended to blink less than average during normal conversation also tended to blink less than average while playing a video game. Similarly, those who tended to blink more than average during normal conversation tended to blink more than average while playing a video game. By recording the difference in the number of blinks under each treatment for each person, we could eliminate the differences among people, allowing us to more accurately measure the differences between treatments, that is, between normal conversation and video playing.
Sometimes, it is not reasonable for both treatments to be applied to the same person. In this case, we may want to pair by some factor that will help explain the variability in the response. For example, suppose we want to compare two treatments for cholesterol. We could pair patients by their initial cholesterol levels. Those with the highest cholesterol level would be in the first pair. Those with the next highest cholesterol level would be in the next pair, and so forth. Then, within each pair, one of the patients would be randomly assigned to the first treatment, and the other would get the second treatment.
Whether or not to use pairing is an important consideration. Matched pairs should be formed only if the researcher believes that significant difference in the response variable can be explained, allowing differences in the treatments to be detected more readily. As an illustration, suppose we decided to pair patients in the cholesterol study on the basis of the length of their feet. The two with the longest feet would be in the first pair, the two with the next longest feet would be in the second pair, and so on. We have no reason to believe that foot length is in any way related to cholesterol level. Pairing in such a situation provides no benefit and is not as effective for assessing whether or not the treatment means are different as the two-group design.
A researcher wants to compare the quality of cooking roasts using two methods—open pan and bag. Four ovens are available for the study. Eight roasts of equal quality have been allocated for the study.
- Describe how to conduct the study using a matched-pairs design.
- Describe how to conduct the study using a twogroup design.
- Which of the two designs would you use for this study? Explain.
- For a matched-pairs design, two roasts would be cooked in each oven, one in an open pan and the other in a bag. The location of each roast within the oven would be randomly determined.
- For a two-group design, we randomly select two of the ovens to cook a roast using the open-pan method; the other two ovens would each be used to cook a roast in a bag.
- The matched-pairs design would be the best for this study. Ovens often vary in their ability to hold temperature at a specified level. By having both treatments in each oven, differences between ovens can be accounted for in the analysis. As described, we have used half as many roasts in the two-group design. We could put two roasts in each oven and cook using the same method. This gives us information on the differences within an oven and allows us to more precisely estimate the quality of the roasts cooked in a specific oven. Cooking two roasts in the same oven does not double the number of experimental units in the study. An oven would be the experimental unit because the cooking methods were randomly assigned to the ovens.
Once we have decided to conduct an experiment using matched pairs, how do we actually go about conducting the study? First, the study units need to be obtained. As we learned in Lesson 2, if the study units are randomly selected from some population, conclusions can be made for that population at the end of the study; otherwise, conclusions apply only to the units in the study. In the shoe-shine study, children were randomly selected. The group from which these children were randomly selected is the population for which inferences can be made.
Next, the study units need to be paired. Individuals could be matched according to a characteristic that could explain some of the difference in the response variable. In the cholesterol study, individuals were matched by initial cholesterol level. Sometimes, both treatments can be sequentially applied to the same individual. This form of matched pairs is often very strong, but may require more time than is available for the study.
Once the pairs are formed, one treatment is randomly assigned to one unit in the pair; the other unit receives the second treatment. Notice that a separate randomization is used for each pair. For the shoe-shine study, it would not be sufficient to flip a coin and randomly assign the first treatment to all right shoes and the other treatment to all left shoes. Children are right or left-footed just as they are right- or left-handed. It is possible that one shoe, say the right shoe, tends to get the most wear because most children are right footed. If this is the case, then the treatment assigned to all right shoes would be at a disadvantage in the study. To avoid this and other biases of which we may not even be aware, we randomly assign treatments within each pair.
Once the study is complete, we record the response variable for each unit. Let X1i be the observed response from the first treatment in pair i, i = 1, 2, . . ., n, where there are n pairs. Similarly, let X2i be the observed response from the second treatment in pair i,i = 1, 2, . . . , n. Then Di = X1i – X2i, i = 1, 2, . . . , n, is the observed difference in the two treatments for the ith pair. There is a conceptual population of Di's comprised of the differences in all possible pairs that could have been used in this study. This population has μD and standard deviation σD.
The sample mean difference in the two treatments, , is an estimate of the difference in the treatment means, μ1– μ2= μD, the mean of the population of paired treatment differences. The sample variance of the pairwise differences provides an estimate of σD2 and is . The sample standard deviation is sD = √. Notice that and sD are, respectively, the sample mean and sample standard deviation of the differences. This would lead us to speculate that the standard error of is . This is, in fact, the case! The analysis of a paired study is based on these quantities. We will consider this further in the next two lessons.
An athletic shoe company believes that they have developed a shoe that will help short-distance runners lower their times in races. They recruited 24 runners. Each runner was given a new pair of the athletic shoes. The runners were encouraged to use these shoes and their favorite pair of running shoes equally in practice for two weeks. After two weeks, the runners ran two 100-meter dashes with five hours between races. For each runner, a coin was flipped. If the coin landed heads up, the runner wore his or her favorite running shoes in the first race; otherwise, he or she wore his or her newly developed shoes. In the second race, each runner wore the pair of shoes that was not used in the first race. The times for the runners are given in Table 18.1.
- Explain why this study has a matched-pairs design. Include a clear statement describing what constitutes a pair.
- Find the difference in observations from each pair.
- Estimate the mean and standard deviation of the differences in time to run a 100-meter dash when wearing the favorite running shoes compared to the new running shoes.
- Find the standard error of the estimated mean of the differences in time to run a 100-meter dash when wearing the favorite running shoes compared to the new running shoes.
- Is the assumption reasonable that the differences are normally distributed?
- To which population may inference be drawn from this study?
- The two treatments are the favorite running shoes and the newly developed running shoes. Each treatment is applied to a runner. Thus, the favorite running shoes and the newly developed running shoes are paired by runner. A pair consists of the running times for the two treatments from a single runner. The order in which the shoes were used was randomized for each runner, a critical step in conducting the study.
- The differences in the two treatments are computed for each runner (see Table 18.2).
The estimated mean difference in the running times using the favorite shoes versus using the newly developed shoes is
(–0.06 + 0.64 +………+ 0.34)
The estimated variance of these differences is , and the estimated standard deviation is 0.3068.
- The standard error of the estimated differences in the two treatments is .
- Because the sample size is small, it is difficult to determine whether or not the observed differences are normally distributed. Although formal tests exist for determining normality, we will not study them here. Instead, we will rely on examining graphs to determine whether there are indications that the data may not be normal. Figures 18.1, 18.2, and 18.3 show a histogram, a dotplot, and a boxplot, respectively. The histogram looks fairly symmetric and unimodal. With only 24 observations, the shape of a population is often not fully captured in a histogram of the data. The dotplot appears to be centered at about 0.20. The values range from –0.39 to 0.84 with a higher concentration of dots in the center. From the boxplot, the data appear to be fairly symmetric without any outliers. In summary, we do not see any indication of skewness, outliers, or other features that would cause us to think that the assumption of normality is unreasonable.
- Because the runners were recruited and not randomly selected from some population, the population to which inference may be drawn is the runners in this study.