Unit 3: More Two-Variable Relationships
|Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone|
Main ConceptsNot all relationships are linear. In this unit we explore some ways around this. Transforming non-linear relationships into linear relationships is a difficult concept for many students.
• Transforming the data to linearity
Some non-linear relationships can be transformed into linear relationships by transforming either the x variable, the y variable, or both. Although there is an infinite variety of transforming functions to consider, in practice (in this course) only power transformations, exponentials, and logarithms are used. Sometimes the appropriate transform is suggested by a theory that prescribes the relation (e.g. a physics equation) but also, sometimes, one simply tries a transform and sees if it makes the residual plot look better.
• Errors in Interpretation of Regression
Causal relations cannot be inferred from regressions, correlations, or scatterplots. Language is important here. Algebra teaches us that the slope tells us the "change in y for a unit change in x". But in most contexts in which we interpret regression, x was not observed to change (and it may not even be possible to change it.)
• Aggregate data
Many data sets are actually aggregates of larger collections of data, and these can lead to a false sense of security. For example, suppose we plot the average SAT score by state against the average per pupil expenditure per state. Each average SAT score consists of the SAT scores of the many thousands of students in the corresponding state. If we instead plotted these student observations against the corresponding state’s average expenditure, we would have seen a much "messier" scatter plot, and would have had a much lower R-squared value. This means that the correct interpretation will include a statement about the relationship between states' scores and states' expenditures, and not about individuals' scores and states' expenditures (which would be a much weaker relation).
Try not to do this, particularly if you are a weatherman or stock broker. The idea is that many, many phenomena are linear for short segments, but non-linear over a larger scale. So if you try to predict a y-value for x's that are beyond the range of observed data, you are implicitly assuming that the relationship will continue to be linear (with the same slope) beyond the range of observed data. And this is an assumption that is often impossible to verify and many times is untrue.