• Exploratory Data Analysis
- Again, this unit provides tools to answer questions about
the relationships between two numerical variables in a given data set
using numerical and graphical summaries.
- For numerical summaries: Just as center, spread, and shape
are used to describe univariate distributions (as in Unit 1), trend,
strength and shape are used to describe bivariate distributions.
- For graphical summaries: Just as we use boxplots,
histograms and dotplots to visualize the relationship between a
numerical variable with respect to a categorical variable, we use
scatterplots to visualize the relationship between two numerical
variables. Residual plots are a form of scatter plots used to examine
the relationship between numerical values generated from a model and
the corresponding numerical data.
- Later in the course, we will learn to determine if apparent
linear associations might be due to chance just as we would like to
determine whether differences between groups (as covered in Unit 1) are
due to chance.
- This unit provides students with their first glimpse of
statistical models. Throughout the course, students will learn about
various types of models that they can use to answer questions about
- It is important to emphasize the theoretical nature of
models in comparison with the real data (in this case, predicted values
versus actual values, for example). This can even be related back to
the previous unit where we discuss that a symmetric distribution might
not be perfectly symmetric. Students will have trouble with this
abstractness in upcoming units dealing with probability and simulation
models, so introducing this notion early might help alleviate that