Education.com
Try
Brainzy
Try
Plus

# Correlation Principles Help (page 2)

(not rated)
By McGraw-Hill Professional
Updated on Sep 12, 2011

## Correlation is Linear

There are plenty of computer programs that can calculate correlation numbers based on data input or scatter plots. In this book, we won't get into the actual formulas used to calculate correlation. The formulas are messy and tedious for any but the most oversimplified examples. At this introductory level, it's good enough for you to remember that correlation is a measure of the extent to which the points in a scatter plot are concentrated near the least-squares line.

The key word in correlation determination is the word ''line.'' Correlation in a scatter plot is defined by the nearness of the points to a particular straight line determined from the points on the plot. If points lie along a perfectly straight line, then either r = –1 or r = +1. The value of r is positive if the values of both variables increase together. The value of r is negative if one value decreases as the other value increases.

Once in a while, you'll see a scatter plot in which all the points lie along a smooth curve, but that curve is not a straight line. This is a special sort of perfection in the relationship between the variables; it indicates that one is a mathematical function of the other. But points on a non-straight curve do not indicate a correlation of either –1 or +1. Figure 7-1A shows a scatter plot in which the correlation is +1.

Figure 7-1B shows a scatter plot in which the correlation is perfect in the sense that the points lie along a smooth curve, but in fact the correlation is much less than +1.

## Correlation and Outliers

In some scatter plots, the points are concentrated near smooth curves or lines, although it is rare for any scatter plot to contain points as orderly as those shown in Fig. 7-1A or B. Once in a while, you'll see a scatter plot in which almost all of the points lie near a straight line, but there are a few points that are far away from the main group. Stray points of this sort are known as outliers. These points are, in some ways, like the outliers found in statistical distributions.

One or two ''extreme outliers'' can greatly affect the correlation between two variables. Consider the example of Fig. 7-2. This is a scatter plot in which all but two of the points are in exactly the same positions as they are in Fig. 7-1A. But the two outliers are both far from the least-squares line. These points happen to be at equal distances (indicated by d) from the line, so their net effects on the position of the line cancel each other. Thus, the least-squares line in Fig. 7-2 is in the same position as the least-squares line in Fig. 7-1A. But the correlation values are much different. In Fig. 7-1A, r = +1. In the situation shown by Fig. 7-2, r is much smaller than +1.

## Correlation and Definition of Variables

Here's another important rule concerning correlation. It doesn't matter which variable is defined as dependent and which variable is defined as independent. If the definitions of the variables are interchanged, and nothing about the actual scenario changes, the correlation remains exactly the same.

Think back to the previous chapter, where we analyzed the correlation between average monthly temperatures and average monthly rainfall amounts for two cities. When we generated the scatter plots, we plotted temperature on the horizontal axis, and considered temperature to be an independent variable. However, we could just as well have plotted the rainfall amounts on the horizontal axis, and defined them as the independent variables. The resulting scatter plots would have looked different, but upon mathematical analysis, the correlation figures would have come out the same.

Sometimes a particular variable lends itself intuitively to the role of the independent variable. (Time is an excellent example of this, although there are some exceptions.) In the cases of Happyton and Blissville from the previous chapter, it doesn't matter much which variable is considered independent and which is considered dependent. In fact, these very labels can be misleading, because they suggest causation. Does the temperature change, over the course of the year, actually influence the rainfall in Happyton or Blissville? If so, the effects are opposite between the two cities. Or is it the other way around – rainfall amounts influence the temperature? Again, if that is true, the effects are opposite between the two cities. There is something a little weird about either assumption. Perhaps another factor, or even a combination of multiple factors, influences both the temperature and the rainfall in both towns.

150 Characters allowed

### Related Questions

#### Q:

See More Questions

### Today on Education.com

Top Worksheet Slideshows