Unit 2: Two-Variable Relationships

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 



Exploratory Data Analysis
  • Again, this unit provides tools to answer questions about the relationships between two numerical variables in a given data set using numerical and graphical summaries.

  • For numerical summaries: Just as center, spread, and shape are used to describe univariate distributions (as in Unit 1), trend, strength and shape are used to describe bivariate distributions.

  • For graphical summaries: Just as we use boxplots, histograms and dotplots to visualize the relationship between a numerical variable with respect to a categorical variable, we use scatterplots to visualize the relationship between two numerical variables. Residual plots are a form of scatter plots used to examine the relationship between numerical values generated from a model and the corresponding numerical data.

  • Later in the course, we will learn to determine if apparent linear associations might be due to chance just as we would like to determine whether differences between groups (as covered in Unit 1) are due to chance.
  • This unit provides students with their first glimpse of statistical models. Throughout the course, students will learn about various types of models that they can use to answer questions about their data.

  • It is important to emphasize the theoretical nature of models in comparison with the real data (in this case, predicted values versus actual values, for example). This can even be related back to the previous unit where we discuss that a symmetric distribution might not be perfectly symmetric. Students will have trouble with this abstractness in upcoming units dealing with probability and simulation models, so introducing this notion early might help alleviate that confusion.