Unit 2: Two-Variable Relationships

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 


 Main Concepts

• In this unit, we are introducing the concept of a statistical model. A model is a set of assumptions about a variable or a relationship between variables. It is an idealization of reality, which we hope approximates reality closely enough for our purposes. Or, as George Box, a well-known statistician, once said: "All models are wrong; some are useful."

• Models can be used for different purposes, including summarizing relationships, making predictions, and understanding phenomena.

• Regression is technique used to model relationships in the context of one dependent variable being explained by one or more independent variables. It is a very complex and subtle tool about which entire books have been written. We will very lightly scratch the surface in this course. We will return to this topic at the end of this course and scratch this surface once more.

• A surprisingly large number of interesting relationships can be modeled by a linear relationship. In this unit, we will focus on linear relationship between two numerical variables.

• A scatterplot is the graphical summary of the relationship between two numerical variables and must be examined to determine whether a linear relationship may exist.

• The correlation coefficient is a statistic that measures the strength and direction of a linear relationship between two variables. Its value tells us something about the predictive ability of a linear relationship. Correlation does not imply causation.

• In addition to learning to interpret models, we also study how well suited they are to answering our questions and how well model fits the data. R-squared, the correlation coefficient squared, helps to quantify the usefulness of the model. Residual plots are useful tools for evaluating the fit of the model.