Unit 3: More Two-Variable Relationships
|Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone|
Teaching Tips• There is a tradeoff between interpretability and a perfect fit. When in doubt, it's often best to prefer an interpretable model over a perfect fit.
• Also, keep in mind the principle of parsimony. Simple and good is preferred over complicated and perfect.
• Students will want to know which variable to transform or transform first. The "ladder of transformations" (see Resources) can help with this. Technology can also help. Fathom lets you easily try on different transforms to see which makes the scatterplot most linear.
• The "ladder of transformations" (see Resources) might be too much information from some students. This is okay, but make sure they understand how to use a log, square-root, or inverse transform.
• Should not use the highest r-squared (or r) value to try to find the best transformation. The correlation coefficient is only part of the story, and statisticians prefer a model with a lower r-squared value if the residual plot looks better.
• Students might find your discussion easier to follow if you identify the variables by name (e.g. "height") rather than a letter label (e.g. "x"). Thus, when you transform, it will be easier for students to follow "log height" as a transformed variable.
• Be careful inventing exponential examples on the fly because sometimes the transformation also requires a shift. In other words, y = a + exp(bx) needs to be shifted by a-units before you can take a log transform.
Student Misconceptions and Confusions• As mentioned in the main concepts and in the previous unit, correlation does not imply causation. Just because two variables are correlated (have an association), it does not mean that the independent (or explanatory) variable causes to dependent (or response) variable to behave the way it does. A controlled experiment is necessary to conclude causation.
• Again, beware of extrapolation! Many physical phenomena are linear over a short range of values, but fail to be linear over a greater range. The moral of this story is that just because a trend appears to be linear for the data you observed does not mean it will be linear for data beyond this range.
• Interpreting the coefficients in a model that includes transformed variables challenges the students. Be sure that their interpretations are in the proper units. For example, if the model is "degrees Fahrenheit" = 3+ 4*log(time) then the coefficient 4 relates differences in degrees with differences in log(time) and not time. That is, a 10-fold increase in time corresponds to a 4 degree difference in temperature.
Resources• The "ladder of transformations", Yates, Moore and Starnes p. 201 or Peck, Olsen and Devore p. 250