The term meta-analysis, first coined by Gene Glass in 1976, refers to a statistical technique used to synthesize data from separate comparable studies in order to obtain a quantitative summary of research addressing a common question. In 1904 Karl Pearson published what is believed to be the first meta-analysis, examining the effectiveness of a vaccine against typhoid. In 1932 Ronald Fisher, in his classic text Statistical Methods for Research Workers, presented a technique for combining the p values that came from statistically independent tests of the same hypothesis. However, meta-analysis began to gain widespread use beginning in the 1960s with the tremendous growth in social scientific research and increasing interest in its social policy implications (Chalmers, Hedges, & Cooper, 2002). During the 1970s and 1980s, many of the techniques first invented by Pearson and Fisher were rediscovered and more sophisticated techniques were developed with the work of Gene Glass, Barry McGaw, and Mary Lee Smith (1981), John Hunter, Frank Schmidt, and Greg Jackson (1982), Robert Rosenthal (1984), and Larry Hedges and Ingram Olkin (1985).
Prior to the widespread use of meta-analysis, researchers relied on a narrative approach to summarize and integrate research on a specific topic. However, traditional narrative reviews have been criticized because, although they can provide a meticulous list of multiple tests of a hypothesis, they often fail to fully and accurately integrate findings and are prone to allowing the biases of the reviewer to enter into conclusions. Just as in the traditional narrative review of research, the aim of a meta-analysis is to summarize the results of past studies, suggest potential reasons for inconsistencies in past research findings, and direct future investigations. However, although meta-analysis has the same goals as the traditional narrative review, many limitations of the narrative review can be addressed by using statistical procedures to combine the results of previous studies (Cooper & Rosenthal, 1980). In the early 2000s, meta-analysis is an accepted and respected technique across many disciplines from psychology and education to medicine and public policy.
Prior to conducting a meta-analysis, the researcher must first define the problem to be addressed by the meta-analysis, collect research relevant to the problem, and evaluate the quality of the data (Cooper, 1998). After these steps have been completed, then a meta-analysis can be conducted and the results interpreted.
Often the purpose of a meta-analysis is to answer three questions. First, does variable X have an effect on variable Y? Second, how much of an effect does variable X have on Y? Finally, are there moderating variables that can explain why the effect of X on Y varies from one study to the next? To answer the first two questions, meta-analysts will (a) calculate an effect size for the outcomes of hypothesis tests in every study and (b) average these effect sizes across hypothesis tests to estimate general magnitudes of effect and calculate confidence intervals as a test of the null hypothesis. In order to examine the question of moderating variables, meta-analysts will also (c) conduct homogeneity analyses in order to assess whether variations in outcomes exist and what features of comparisons might account for that variation, if it exists. The procedures for conducting a meta-analysis are described in detail in Cooper (1998), Cooper and Hedges (1994), Hedges and Olkin (1985), and Lipsey and Wilson (2001).
Estimating Effect Sizes. Cohen (1988) defined an effect size as “the degree to which the phenomenon is present in the population, or the degree to which the null hypothesis is false” (pp. 9–10). There are many different metrics to describe an effect size. Generally, each metric is associated with particular research designs. Although numerous estimates of effect size are available, three dominate the literature. The first, called the d-index by Cohen (1988), is a scale-free measure of the separation between two group means that is used when one variable in the relation is dichotomous and the other is continuous. Calculating the d-index for any study involves dividing the difference between the two group means by either their average standard deviation or the standard deviation of the control group. Another effect size metric is the r-index, or the Pearson product-moment correlation coefficient. Typically it is used to measure the degree of linear relation between two continuous variables. The third effect size metric is the odds ratio. The odds ratio is applicable when both variables are dichotomous and findings are presented as frequencies or proportions. The index is often used in studies of educational interventions when the outcome of interest is drop-out or retention rates.
Averaging Effect Sizes. The primary findings of meta-analyses are the average effect sizes and measures of dispersion that accompany them. State-of-the art metaanalytic procedures call for the weighting of effect sizes when they are averaged across studies. In the weighted procedure, each independent effect size is first multiplied by the inverse of its variance and the sum of these products is then divided by the sum of the inverses. The weighting procedure is generally preferred because it gives greater weight to effect sizes based on larger samples and larger samples give more precise population estimates. Confidence intervals are then calculated to test the null hypothesis that the difference between two means, or the size of a correlation or odds ratio, is zero.
Homogeneity Analyses. In addition to the confidence interval as a measure of dispersion, meta-analysts usually carry out homogeneity analyses. Homogeneity analyses allow the meta-analyst to explore whether effect sizes vary from one study to the next. A homogeneity analysis compares the amount of variance in an observed set of effect sizes with the amount of variance that would be expected by sampling error alone and provides calculation of how probable it is that the variance exhibited by the effect sizes would be observed if only sampling error were making them different. If there is greater variation in effects than would be expected by chance, then the meta-analyst can begin the process of examining moderators of comparison outcomes.
Moderator Analyses. Homogeneity analyses also allow the meta-analyst to test hypotheses about why the outcomes of studies differ. First, the meta-analyst calculates average effect sizes for different subgroups of studies, comparing the average effect sizes for different methods, types of programs, outcome measures, or participants. Then, homogeneity analyses are used to statistically test whether these factors are reliably associated with different magnitudes of effect. As previously suggested, homogeneity analyses assess whether sampling error alone accounts for variation or whether the features of studies, samples, treatment designs, or outcome measures also explain variations in the strength and/or direction of effect sizes across various groupings. This test is analogous to conducting an analysis of variance, in that a significant homogeneity statistic indicates that at least one group mean differs from the others.
Alternatively, meta-regression can be used to examine whether particular characteristics of studies are related to the sizes of the treatment effect. However, unlike the strategy previously discussed, meta-regression allows the meta-analyst to explore the relationship between continuous, as well as categorical, characteristics and effect size, and allows the effects of multiple factors to be investigated simultaneously (Thompson & Higgins, 2002).
When conducting primary research, investigators encounter decision points at which they have multiple choices about how to proceed. Meta-analysts must make decisions concerning how to handle multiple effect sizes coming from the same sample since this violates the assumption of most meta-analytic procedures that effect sizes are independent data points. Meta-analysts employ multiple approaches to handling non-independent tests. Some treat each effect size as independent so that no within-study information is lost, regardless of the number that come from the same study. However, this strategy violates the assumption that the estimates are independent and results of studies will not be weighted equally in any overall conclusion about results. Rather, studies will contribute to the overall effect in relation to the number of statistical tests contained in it.
Others use the study as the unit of analysis. In this strategy, a mean or median result is calculated to represent the study. This strategy ensures that the assumption of independence is not violated and that each study contributes equally to the overall effect. However, some within study information may be lost in this approach. As of 2007 the preferred approach is to use a shifting unit of analysis (Cooper, 1998). Here, each study is allowed to contribute as many effects as there are categories in the given analysis, but effects within any category are averaged. This shifting unit of analysis approach retains as much data as possible from each study while holding to a minimum any violations of the assumption that data points are independent.
Meta-analysts also have to decide whether a fixed-effects or random-effects model of error underlies the generation of study outcomes (Hedges & Vevea, 1998). In a fixed-effects model of error, all studies are assumed to be drawn from a common population. As such, variance in effect sizes is assumed to reflect only sampling error, that is, error solely due to participant differences. In a random-effects model of error, studies are expected to vary also as a function of features that can be viewed as random influences. Thus, in a random-effect analysis, study-level variance is assumed to be present as an additional source of random influence. If it is the case that the meta-analyst suspects a large number of these additional sources of random error, then a random-effects model is most appropriate in order to take these sources of variance into account. If the meta-analyst suspects that the data are most likely little affected by other sources of random variance, then a fixed-effects model can be applied. However, it is often difficult to decide whether there may be sources of random error affecting results. Consequently, the most conservative approach is to conduct all analyses twice, once employing fixed-effect assumptions and once using random-effect assumptions. Differences in results based on which set of assumptions is used can be incorporated into the interpretation and discussion of findings.
While the inverse-variance method deriving from Hedges and Olkin (1985) described above has the most widespread use, alternative approaches to meta-analysis exist. In particular, approaches deriving from both Rosenthal (1984) and Hunter, Schmidt, and Jackson (1982) are commonly used.
Like the inverse-variance method, the Rosenthal technique converts study findings into standard index of effect and combines them to produce weighted means. However, unlike the Hedges and Olkin approach to estimate the overall significance of the effect, Rosenthal suggests combining the probabilities of each effect size. Further, heterogeneity is examined informally using diffuse and focused comparisons.
In the Hunter, Schmidt, and Jackson (1982; Hunter & Schmidt, 2004) approach, study findings are also converted into standard index of effect and combined to produce weighted means. However, untransformed effect size estimates weighted by the sample size of the study are used to compute the weighted mean effect size. Further, heterogeneity in effect sizes across studies is examined by comparing the observed variation in obtained effect sizes with the variation expected due to sampling error, that is, the expected variance in effect sizes given that all observed effects are estimating the same underlying population value. However, a formal statistical test of the difference between these two values is typically not carried out. Rather, the meta-analyst adopts a critical value for the ratio of observed-to-expected variance to use as a means for rejecting the null hypothesis. In this approach, the meta-analyst might also adjust effect sizes to account for methodological artifacts such as sampling error, range restrictions, or unreliability of measurements. This method has been applied most often in the areas of industrial and organizational psychology.
There are a number of advantages of meta-analysis over traditional narrative techniques for synthesizing research (see Rosenthal & DiMatteo, 2001, for a full review). First, the structured methodology of meta-analysis requires careful review and analysis of all contributing research. As such, meta-analysis overcomes much of the bias associated with the reliance on single studies or subsets of studies that inevitably occurs in narrative reviews of a literature. Second, meta-analysis allows even small and non-significant effects to contribute to the overall conclusions and avoids wasting data because a sample size was too small and significance was not achieved.
Third, meta-analysis allows the synthesist to ask questions about variables that moderate effects. Specifically, even if no individual study has compared results of different methods, types of programs, outcome measures, or participants, by comparing results across studies the synthesist can get a first hint about whether these variables would be important to look at in future research and/or as guides to policy. Without the aid of statistics, the synthesist simply examines the differences in outcomes across studies, groups them informally by study features, and decides whether the feature is a significant predictor of variation in outcomes. At best, this method is imprecise. At worst, it leads to incorrect inferences. In contrast, meta-analysis provides a formal means for testing whether different features of studies explain variation in their outcomes.
Despite many advantages, meta-analysis has been criticized for a number of legitimate limitations and concerns. First, while many meta-analysts go to great lengths to locate as much relevant research as possible, missing data as a result of the literature search procedures on the part of the synthesist or data censoring on the part of primary researchers, editors, or publishers is often inevitable. When data are systematically missing, not only is the size of the sample gathered for the research synthesis reduced, but the representativeness of the sample and the validity of the results are compromised, regardless of the quality of the meta-analysis in all other respects (Rothstein, Sutton, Borenstein, 2005). A number of techniques have been developed in order to assess the possible presence of data censoring and the implications of this threat to the validity of the conclusions drawn from the meta-analysis (see Rothstein, Sutton, Bor-enstein, 2005, for full review).
Second, meta-analysis is sometimes criticized for combining research of varying quality using various methods and samples. Because a meta-analysis is only as good as the primary research it is cumulating, it is important that the meta-analyst believes that each finding is testing the same relationship and that the primary researchers made valid assumptions when they computed the results of their statistical tests. Of course, research quality can also be used as a moderator variable in the meta-analysis.
Third, while educational psychology research often examines the combination and interaction of many variables in multifactorial models, including regression analyses, meta-analysis is focused on individual effects. Consequently, there is some loss of information in meta-analysis because it remains difficult to include studies in which complex models were used to analyze data.
Finally, synthesis-based evidence should not be interpreted as supporting statements about causality. When different study characteristics are found associated with the effects of a treatment, the synthesist should recommend that future researchers examine these factors within a single experiment.
With the ever growing volume of primary research on various education related topics, meta-analysis has become an essential tool among school policy makers and practitioners for coping with the overwhelming number of results. In the early 2000s, meta-analysis is often used to guide policy and practice in the classroom. Topics have ranged from the effectiveness of homework, access to special education, or the relationship between class size and achievement to the effect of providing rewards on intrinsic motivation or the relationship between race and achievement motivation. Further, often a synthesis of the current research leaves as many questions unanswered as it answers. However, it provides a comprehensive guide to direct future research. Clearly, the synthesist faces a number of complex issues in conducting a meta-analysis. However, if social science research is to contribute to rational decision making, then rigorous, systematic syntheses of research are a most critical component in researchers' methodological toolbox. Meta-analysis facilitates the attainment of these necessary standards.
See also:Research Methods: An Overview
Chalmers, I., Hedges, L. V., & Cooper, H. (2002). A brief history of research synthesis. Evaluation & the Health Professions, 25, 12–37.
Cohen, J. (1988). Statistical Power Analysis in the Behavioral Sciences. Hillsdale, NJ: Erlbaum.
Cooper, H. M. (1998). Synthesizing Research: A Guide for Literature Reviews (3rd ed.). Thousand Oaks, CA: Sage.
Cooper, H., & Hedges, L.V. (1994). Handbook of Research Synthesis. New York: Russell Sage.
Cooper, H. M., & Rosenthal, R. (1980). Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin, 87, 442–449.
Fisher, R. A. (1932). Statistical methods for research workers. London: Oliver & Boyd.
Glass, G. V, McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hedges, L. V., & Vevea, J. L. (1998). Fixed and random effects models in meta-analysis. Psychological Methods, 3, 486–504.
Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.) Thousand Oaks, CA: Sage.
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical Journal, 3, 1243–1246.
Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly Hills, CA: Sage.
Rosenthal, R. & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology, 52, 59–82.
Rothstein, H. R., Sutton, A. J. & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. London: Wiley.
Thompson, S. G., & Higgins, J. P. T. (2002). How should metaregression analyses be undertaken and interpreted? Statistics in Medicine, 21, 1559–1573.
Add your own comment
- Kindergarten Sight Words List
- The Five Warning Signs of Asperger's Syndrome
- What Makes a School Effective?
- Child Development Theories
- 10 Fun Activities for Children with Autism
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- Should Your Child Be Held Back a Grade? Know Your Rights
- Bullying in Schools
- First Grade Sight Words List
- Test Problems: Seven Reasons Why Standardized Tests Are Not Working