Regression Help (page 2)
Introduction to Regression—Paired Data
Regression is a way of defining the extent to which two variables are related. Regression can be used in an attempt to predict things, but this can be tricky. The existence of a correlation between variables does not always mean that there is a cause-and-effect link between them.
Imagine two cities, one named Happyton and the other named Blissville. These cities are located far apart on the continent. The prevailing winds and ocean currents produce greatly different temperature and rainfall patterns throughout the year in these two cities. Suppose we are about to move from Happyton to Blissville, and we've been told that Happyton ''has soggy summers and dry winters,'' while in Blissville we should be ready to accept that ''the summers are parched and the winters are washouts.'' We've also been told that the temperature difference between summer and winter is much smaller in Blissville than in Happyton.
We go to the Internet and begin to collect data about the two towns. We find a collection of tables showing the average monthly temperature in degrees Celsius (°C) and the average monthly rainfall in centimeters (cm) for many places throughout the world. Happyton and Blissville are among the cities shown in the tables. Table 6-1A shows the average monthly temperature and rainfall for Happyton as gathered over the past 100 years. Table 6-1B shows the average monthly temperature and rainfall for Blissville over the same period. The data we have found is called paired data, because it portrays two variable quantities, temperature and rainfall, side-by-side.
We can get an idea of the summer and winter weather in both towns by scrutinizing the tables. But we can get a more visual-friendly portrayal by making use of bar graphs.
Paired Bar Graphs
Let's graphically compare the average monthly temperature and the average monthly rainfall for Happyton. Figure 6-8A is a paired bar graph showing the average monthly temperature and rainfall there.
The graph is based on the data from Table 6-1A. The horizontal axis has 12 intervals, each one showing a month of the year. Time is the independent variable. The left-hand vertical scale portrays the average monthly temperatures, and the right-hand vertical scale portrays the average monthly rainfall amounts. Both of these are dependent variables, and are functions of the time of year. The average monthly temperatures are shown by the light gray bars, and the average monthly rainfall amounts are shown by the dark gray bars. It's easy to see from this data that the temperature and rainfall both follow annual patterns. In general, the warmer months are wetter than the cooler months in Happyton.
Now let's make a similar comparison for Blissville. Figure 6-8B is a paired bar graph showing the average monthly temperature and rainfall there, based on the data from Table 6-1B.
From this data, we can see that the temperature difference between winter and summer is less pronounced in Blissville than in Happyton. But that's not the main thing that stands out in this bar graph! Note that the rainfall, as a function of the time of year, is much different. The winters in Blissville, especially the months of January and February, are wet. The summers, particularly June, July, and August, get almost no rainfall. The contrast in general climate between Happyton and Blissville is striking. This information is, of course, contained in the tabular data, but it's easier to see by looking at the dual bar graphs.
When we examine Fig. 6-8A, it appears there is a relationship between temperature and rainfall for the town of Happyton. In general, as the temperature increases, so does the amount of rain. There is also evidently a relationship between temperature and rainfall in Blissville, but it goes in the opposite sense: as the temperature increases, the rainfall decreases. How strong are these relationships? We can draw scatter plots to find out.
In Fig. 6-9A, the average monthly rainfall is plotted as a function of the average monthly temperature for Happyton. One point is plotted for each month, based on the data from Table 6-1A. In this graph, the independent variable is the temperature, not the time of the year. There is a pattern to the arrangement of points. The correlation between temperature and rainfall is positive for Happyton. It is fairly strong, but not extremely so. If there were no correlation (that is, if the correlation were equal to 0), the points would be randomly scattered all over the graph. But if the correlation were perfect (either +1 or –1), all the points would lie along a straight line.
Figure 6-9B shows a plot of the average monthly rainfall as a function of the average monthly temperature for Blissville. One point is plotted for each month, based on the data from Table 6-1B. As in Fig. 6-9A, temperature is the independent variable. There is a pattern to the arrangement of points here, too. In this case the correlation is negative instead of positive. It is a fairly strong correlation, perhaps a little stronger than the positive correlation for Happyton, because the points seem more nearly lined up. But the correlation is far from perfect.
The technique of curve fitting, which we learned about in Chapter 1, can be used to illustrate the relationships among points in scatter plots such as those in Figs. 6-9A and B.
Examples, based on ''intuitive guessing,'' are shown in Figs. 6-10A and B. Fig. 6.10A shows the same 12 points as those in Fig. 6-9A, representing the average monthly temperature and rainfall amounts for Happyton (without the labels for the months, to avoid cluttering things up). The dashed curve represents an approximation of a smooth function relating the two variables. In Fig. 6-10B, a similar curve-fitting exercise is done to approximate a function relating the average monthly temperature and rainfall for Blissville.
In our hypothetical scenarios, the data shown in Tables 6-1A and B, Figs. 6-8A and B, Figs. 6-9A and B, and Figs. 6-10A and B are all based on records gathered over 100 years. Suppose that we had access to records gathered over the past 1000 years instead! Further imagine that, instead of having data averaged by the month, we had data averaged by the week. In these cases we would get gigantic tables, and the bar graphs would be utterly impossible to read. But the scatter plots would tell a much more interesting story. Instead of 12 points, each graph would have 52 points, one for each week of the year. It is reasonable to suppose that the points would be much more closely aligned along smooth curves than they are in Figs. 6-9A and B or Figs. 6-10A and B.
Today on Education.com
- Coats and Car Seats: A Lethal Combination?
- Kindergarten Sight Words List
- Signs Your Child Might Have Asperger's Syndrome
- Child Development Theories
- 10 Fun Activities for Children with Autism
- Social Cognitive Theory
- Why is Play Important? Social and Emotional Development, Physical Development, Creative Development
- GED Math Practice Test 1
- Problems With Standardized Testing
- The Homework Debate