Describing and Displaying Bivariate Data Study Guide (page 2)

Updated on Oct 5, 2011


Drop height is the explanatory variable because this is the variable that is controlled during the study. Rebound height is the response variable because the rebound height was measured for a given drop height. Thus, drop height is on the x-axis and rebound height is on the y-axis. The data are plotted in Figure 8.2.

Figure 8.2

The ball did not always have the same rebound height when it was dropped repeatedly from a specific drop height; there was variability in the rebound heights for a given drop height. The rebound height tends to increase in a linear manner as the drop height increases, though the relationship is certainly not an exact one.

Pearson's Correlation Coefficient

One of the challenges in working with two or more variables is that they could have different units of measurements (inches, pounds, liters, etc.), means, standard deviations, or other characteristics. It is often

Pearson's Correlation

Pearson's correlation coefficient is defined to be:

helpful to have all variables on a common scale. Although there are many possible scales,we transform the original values of each variable so that the mean is zero and the standard deviation is one. z-scores are the transformed values of a random variable that have a mean of zero and a standard deviation of one; that is, 

If all population values are known, the population mean and standard deviation are used to find the z-score; if sample values are available, the sample mean and sample standard deviation are used to find the z-scores.

Let (x1,y1), (x2,y2), . . . , (xn,yn) be a random sample of n (x, y) pairs. Suppose we replace each x-value by its z-score, zX, by subtracting the sample mean, , and dividing by the sample standard deviation, sX. (Note that the subscript on s indicates the variable, here X, for which s is the sample standard deviation. Subscripts are often used in this manner when more than one variable is of interest in order to avoid confusion.) Similarly, suppose that each y-value is replaced by its z-score, zY. Note that, if x (or y) is larger than the sample mean (or ), zX (or zY) is positive. Likewise, if x (or y) is smaller than the sample mean (or ), zX (or zY) is negative.

Consider the sample of (x,y) pairs displayed in the graph in Figure 8.3. It is clear that there is a strong positive relationship between X and Y. The dashed horizontal line through and the dashed vertical line through divide the graph into four quadrants, which are labeled I, II, III, and IV. In quadrant I, both x and y are above their respective sample means; thus, zX and zY are positive, and zXzY is positive. For (x,y) in quadrant II, x is below its sample mean and y is above its sample mean; therefore, zXzY is negative. Notice that zXzY is positive in quadrant III because zX and zY are both negative and the product of two negative numbers is a positive number. Finally, because x is above its mean and y is below its mean, zXzY is negative in quadrant IV. Notice that, for the rebounding ball example, almost all of the points are in quadrants I and III, so would be positive. In contrast, if most of the points lie in quadrants II and IV, would be negative.

Figure 8.3

These ideas are the foundation for Pearson's correlation coefficient r, which provides a measure of the strength of the linear relationship between X and Y. Pearson's correlation coefficient is defined to be .

Figure 8.4

The correlation coefficient has some important properties. First, the value r is unitless; that is, it does not depend on the unit of measurement of either variable. X and Y can be measured in inches, meters, or light years, and the value of r would not change. Second, it does not matter which variable is labeled X and which is labeled Y; the value of r will be the same. Third, Pearson's r is always between –1 and +1. A value of one or –1 occurs when an exact linear relationship exists between X and Y. If r = 1, the slope of the line is positive; if r = –1, the slope of the line is negative. The closer r is to 1 or –1, the stronger the linear relationship between X and Y is. Finally, it is important to realize that r measures only the linear relationship in X and Y. It is possible for X and Y to have a very strong relationship and for r to be near zero. In these cases, the strong relationship is not linear in nature. Some scatter plots with the associated r values are shown in Figure 8.4.

View Full Article
Add your own comment