Correlational research is an important form of educational and psychological research. Some knowledge of correlational methods is important for both the consumption and conduct of research. The purpose of this entry is to (a) define quantitative research methods as a way of framing correlational research, (b) consider multi-variate extensions of the bivariate correlation, including statistical methods for analyzing correlational research data, (c) provide some relevant examples of correlational research, (d) discuss the role of correlational research, and (e) mention some key issues associated with correlational research.
Research in education and psychology can be roughly divided into quantitative research, qualitative research, and historical research. Quantitative research methods can be categorized as descriptive research, correlational research, and experimental research.
Descriptive research describes the phenomena being studied. Data are gathered and descriptive statistics are then used to analyze such data. Thus descriptive research considers one variable at a time (i.e., univariate analysis), and is typically the entry-level type of research in a new area of inquiry. Descriptive research typically describes what appears to be happening and what the important variables seem to be.
The purpose of correlational research is to determine the relations among two or more variables. Data are gathered from multiple variables and correlational statistical techniques are then applied to the data. Thus correlational research is a bit more complicated than descriptive research; after the important variables have been identified, the relations among those variables are investigated. Correlational research investigates a range of factors, including the nature of the relationship between two or more variables and the theoretical model that might be developed and tested to explain these resultant correlations. Correlation does not imply causation. Thus correlational research can only enable the researcher to make weak causal inferences at best.
In experimental research, the researcher manipulates one or more independent or grouping variables (e.g., by comparing treatment conditions, such as an intervention group vs. a control group) and then observes the impact of that manipulation on one or more dependent or outcome variables (e.g., student achievement or motivation). The statistical method of analysis is typically some form of the analysis of variance. Experimental research includes (a) true experiments (in which individuals are randomly assigned to conditions or groups, such as method of instruction or counseling) and (b) quasiexperiments (in which individuals cannot be randomly assigned as they are already in a condition or group, such as gender, socioeconomic status, or classroom). The basic question to be posed in experimental research concerns what extent a particular intervention causes a particular outcome. Thus experimental studies are those in which strong causal inferences are most likely to be drawn.
There are a number of different methods in which correlations can be considered. Each of these methods is directly tied to a particular statistical technique (with names and dates of their initial development). Thus these methods and statistical techniques can be considered together. At the most basic level is a bivariate correlation (contributions by Galton, 1888; Edgeworth, 1892; Pearson, 1900), which examines the correlation or relation between two variables (hence the terms co-relation and bivariate). In some cases one variable is known as an independent variable (or input variable) and the second variable as a dependent variable (or outcome variable). In other cases there are two variables without any such designation. Bivariate correlations provide information about both the strength of the relationship (from uncorrelated, when the correlation is zero, to perfectly correlated, when the correlation is positive or negative one), and the direction of the relationship (positive or negative). A bivariate correlation can only consider two variables at a time. However, there are a number of multivariate extensions to the bivariate correlation in which more than two variables can be simultaneously analyzed.
Regression analysis (1805) of Adrien-Marie Legen-dre (1752–1833) is a method for using one or more independent variables or predictors to predict a single dependent variable or outcome. The relations among the variables are used to develop a prediction model. Because only one dependent variable can be considered, regression analysis can only be used to test simple theoretical models. A related method, created by George Udny Yule (1871–1951), is that of the multiple correlation (1897); it represents the correlation between multiple independent variables and a single dependent variable. The multiple correlation is a direct extension of the bivariate correlation for situations involving multiple independent variables. Path analysis (1918), created by Sewall Wright (1889–1988), is an extension of regression analysis for more than a single dependent or outcome variable. Here more complex theoretical models can be tested, as the relations among multiple independent variables and multiple dependent variables can be simultaneously considered.
Canonical correlation analysis (1935), created by Harold Hotalling (1895–1973) is used to determine the correlation between the linear combination of two sets of variables. Statistically this process is superior to examining a multitude of bivariate correlations (both within and across sets). For example, there may be one set of independent variables and a second set of dependent variables. This method takes the best linear combinations from each set of variables and generates a canonical correlation between the combinations of the two sets. Obviously this method represents an extension of the bivariate correlation and the multiple correlations for situations involving multiple independent variables and multiple dependent variables (or simply for two separate sets of variables).
The previously described methods examine the relations among what are known as observed variables. For example, the Stanford-Binet IQ measure is an instrument that produces an observed measured variable (or score) than can be used to infer intelligence. Latent variables (also known as constructs or factors) are variables that are not directly observed or measured but can be indirectly measured or inferred from a set of observed variables. The Stanford-Binet is one possible observed measure of the latent variable intelligence.
The following methods use both observed variables and latent variables. Factor analysis (Spearman, 1904; Thurstone, 1931) and principal component analysis (Pearson, 1901; Hotelling, 1933) are related multivariate correlational methods. Their purpose is to reduce a set of correlated variables into a smaller set of linear combinations of those variables, known as latent factors or components. For example, with a battery of intelligence tests, one can determine how many factors underlie the data (e.g., a single general intelligence factor, specific performance and verbal intelligence factors, etc.).
Structural equation modeling (Joreskog, 1973; Kees-ling, 1972; Wiley, 1973) combines factor analysis with path analysis to test theoretical relations among latent variables. Here models can range from simple to complex in nature in that any number of variables of any type can be involved (i.e., observed, latent, independent, and/or dependent variables). The incorporation of factor analysis in structural equation modeling allows the researcher to use multiple measures of each latent variable instead of a single measure, thereby enabling better measurement conditions (i.e., reliability and validity) than with a single measure, for example, determining the relationship between an intelligence latent variable and an achievement latent variable, in which each latent variable is measured through multiple indicator variables.
What follows are a few prototypical examples of correlational research that educational and psychological researchers have investigated. Bivariate correlations determined the relations between math anxiety measures and teacher confidence measures (Bursal & Paznokas, 2006). Their results indicated that low math-anxious pre-service teachers were more confident in teaching math and science than high math-anxious pre-service teachers. Regression analysis was used to predict student exam scores in statistics (dependent variable) from a series of collaborative learning group assignments (independent variables) (Delucchi, 2006). The results provided some support for collaborative learning groups improving statistics exam performance, although not for all tasks. Multiple correlations were computed between a nonverbal test of intelligence (dependent variable) and various ability tests (independent variables) (Domino & Morales, 2000). The nonverbal test was significantly correlated with grade point average and ability test scores for Mexican American students.
In a path analysis example, Walberg's theoretical model of educational productivity was tested for fifth-through eighth-grade students (Parkerson et al., 1984). The relations among the following variables were analyzed in a single model: home environment, peer group, media, ability, social environment, time on task, motivation, and instructional strategies. All of the hypothesized paths among those variables were shown to be statistically significant providing support for the educational productivity model. A canonical correlation analysis study examined battered women who killed their abusive male partners (Hattendorf, Ottens, & Lomax, 1999). There were two sets of variables: (1) frequency and severity of posttraumatic stress disorder (PTSD) symptoms, and (2) severity of types of abuses inflicted. The set of symptom variables were found to be highly related to the set of abuse variables, thus indicating a strong relationship between PTSD symptoms and severity of abuse. Another more general example involves the relation between a set of student personality variables and a set of student achievement variables.
In terms of factor analysis and principal component analysis, early examples considered the structure underlying different measures of intelligence (subsequently developed into theories of intelligence). Similar work has examined the dimensions of the Big Five personality assessments.
Finally, two examples of structural equation modeling involving both latent and observed variables can be given here. Kenny, Lomax, Brabeck, and Fife (1998) examined the influence of parental attachment on psychological well being for adolescents. In general, maternal attachment had a stronger effect on well being for girls, while paternal attachment had a stronger effect on well being for boys. Shumow and Lomax (2002) tested a theoretical model of parental efficacy for adolescent students. For the overall sample, neighborhood quality predicted parental efficacy, which predicted parental involvement and monitoring, both of which predicted academic and social-emotional adjustment.
Correlational research has played an important role in the history of educational and psychological research. Early on, the bivariate correlation was used in heredity research and then eventually expanded into all areas of educational and psychological inquiry. Subsequently more sophisticated multivariate extensions enabled researchers to examine multiple variables simultaneously. Correlational research has had and will continue to have an important role in quantitative research in terms of exploring the nature of the relations among a collection of variables. In part, unrelated variables can be eliminated from further consideration, thereby allowing the researcher to give more serious consideration to related variables.
Correlational research can also play an important role in the development and testing of theoretical models. Once the nature of bivariate relations has been determined, this information can then be used to develop theoretical models. The idea here is to attempt to explain the nature of the bivariate correlations rather than to simply report them. At this point, methods such as factor analysis, path analysis and structural equation modeling can come into play.
When consuming or conducting correlational research, there are a number of issues to consider, with some issues being positive and others negative in nature. On the positive side, once descriptive research has helped to identify the important variables, correlational research can then be used to examine the relations among those important variables. For example, researchers may be interested in determining which variables are most highly related to a particular outcome, such as student achievement. This can then lead into experimental research in which the causal relations among those key variables can be examined under more tightly controlled conditions. Here one independent variable can be manipulated by the researcher (e.g., method of instruction), with other related variables being controlled in some fashion (e.g., grade, level of school funding). This then leads to a determination of the impact of the independent variable on the outcome variable, allowing a test of strong causal inference.
On the negative side, a limitation of correlational research is that it does not allow tests of strong causal inference. For example, if researchers find a high bivariate correlation between amount of instructional time (X) and student achievement (Y), then they may ask if this correlation necessarily implies that more instructional time causes higher achievement. The answer is not necessarily. Two variables X and Y can be highly correlated for any of the following reasons and others: (a) X causes Y; (b) Y causes X; (c) Z causes both X and Y, but X and Y are not causally related; (d) X and Y both cause Z, but X and Y are not causally related; and (e) many other variables might be involved. In addition, for a causal relationship X must occur before Y. Thus a bivariate correlation coefficient gives information about the nature of the relations between two variables, but not why they are related. Theoretical models of educational and psychological phenomena tend to be rather complex, certainly involving more than simply two variables. More sophisticated correlational methods, such as factor analysis, path analysis, or structural equation modeling, have the ability to examine the underlying relations among many variables and can, therefore, be used as a basis to argue for causal inference.
Another limitation of correlational methods is they commonly suggest that the variables are linearly related to one another. For example, variables X and Y can be shown to have a linear relationship if the data can be nicely fitted by a straight line. When variables are not linearly related, correlational methods will reduce the strength of the relationship (in other words, the linear relation will be closer to zero). Therefore, nonlinear relationships will result in smaller linear correlations, possibly misleading the researcher and the field of inquiry. Outliers, observations that are quite a bit different from the remaining observations, will also reduce the strength of the relationship. It is wise for researchers to examine their data to see if (a) variables are linearly related (e.g., by the use of scatterplots), and (b) there are any influential observations (i.e., outliers).
A final limitation of correlational research occurs when a researcher seeks to consider the relations among every possible variable. The idea is if researchers examine the relations among enough variables, then certainly some variables will be significantly related. While there is an exploratory consideration here, in terms of seeing which variables are related, there is a statistical consideration as well. That is, if researchers examine enough bivariate correlations, they will find some variables that are significantly related by chance alone. For example, if they examine 100 correlations at the .05 level of significance, then they expect to find five correlations that appear to be significantly different from zero, even though these correlations are not truly different from zero. In this case, the more sophisticated multivariate correlational methods can be useful in that fewer tests of significance tend to be done than in the bivariate case.
Correlational methods of inquiry have been popular in educational and psychological research for quite some time in part because they are foundational in nature in terms of their ability to examine the relations among a number of variables. Also, correlational methods can be used to develop and test theoretical models (e.g., factor analysis, path analysis, structural equation modeling). Despite the limitations of correlational research described here, these methods will continue to be used. Additional information on correlational methods can be found in Grimm and Yarnold (1995, 2000), Lomax (2007), and Schumacker and Lomax (2004).
Bursal, M., & Paznokas, L. (2006). Mathematics anxiety and pre-service teachers' confidence to teach mathematics and science. School Science & Mathematics, 106, 173–180.
Delucchi, M. (2006). The efficacy of collaborative learning groups in an undergraduate statistics course. College Teaching, 54, 244–248.
Domino, G., & Morales, A. (2000). Reliability and validity of the D-48 with Mexican American college students. Hispanic Journal of Behavioral Sciences, 22, 382–389.
Grimm, L.G., & Yarnold, P. R. (Eds.) (1995). Reading and understanding multivariate statistics. Washington, DC: APA.
Grimm, L.G., & Yarnold, P. R. (Eds.) (2000). Reading and understanding more multivariate statistics. Washington, DC: APA.
Hattendorf, J., Ottens, A. J., & Lomax, R. G. (1999). Type and severity of abuse and posttraumatic stress disorder symptoms reported by battered women who killed abusive partners. Violence Against Women, 5, 292–312.
Kenny, M. E., Lomax, R. G., Brabeck, M. M., & Fife, J. (1998). Longitudinal pathways linking maternal and paternal attachments to psychological well-being. Journal of Early Adolescence, 18, 221–243.
Lomax, R. G. (2007). An introduction to statistical concepts (2nd ed.). Mahwah, NJ: Erlbaum.
Parkerson, J. A., Lomax, R. G., Schiller, D. P., & Walberg, H. J. (1984). Exploring causal models of educational achievement. Journal of Educational Psychology, 76, 638–646.
Schumacker, R. E., & Lomax, R. G. (2004). A beginner's guide to structural equation modeling (2nd ed.). Mahwah, NJ: Erlbaum.
Shumow, L., & Lomax, R. G. (2002). Parental efficacy: Predictor of parenting behavior and adolescent outcomes. Parenting: Science and Practice, 2, 127–150.