Unit 1: Exploring Data

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 


 Teaching Tips

• Organizational tools used depend on the question and the nature of the data. For example, with numerical summaries:
  • medians tend to be useful when data are skewed and are often reported with interquartile ranges as a measure of spread
  • means tend to be useful when data are symmetric and are often reported with standard deviations a measure of spread
And with graphical summaries:
  • boxplots are very useful for comparing two or more groups
  • dotplots are useful for viewing the distributions of small data sets, and
  • histograms are useful for viewing the distributions of larger datasets
But keep in mind that these are very general guidelines, and experimentation is always encouraged.

• Clarity in numerical and graphical summaries avoids misconceptions. Mechanics include titles, axes labels, names of variables being summarized and observational units being used.

• Insist that your students put comparative graphs on the same scale, or they may make erroneous comparisons.

• Graphs do not speak for themselves. Students (and you) should never provide a graph without an interpretation.

• Interpretation and communication is the focus. Students need to know what to ask for from the software, but not how the software creates it.

• Be precise with descriptions of data: "Men make more money than women" is too vague to be meaningful. Some women make more than some men, and some women make a LOT more than 95% of the men. What might be true is that the mean/median income for men is higher than the mean/median for women.

• One of the conceptual shifts we're expecting from the students is to stop thinking in terms of individuals and exceptional values, and instead think in terms of groups and general trends. For this reason, when comparing two groups, students should focus on the centers (the "typical"), the spread (how much variety within a group) and the general shape (are there exceptional responses, or differences in the overall shapes from the two groups?).

• When examining the shape of a distribution, we look for the general pattern, but also exceptions from the pattern. Exceptions include outliers, potential outliers, gaps, or anything that piques interest. For example, often self-reported weights come in multiples of five and this can sometimes be detected in a graphic. Also, consider the effect of various bin widths on shape when using histograms.

• Identifying outliers is tricky and confusing. There is no technical definition of an outlier. It is instead a "term of art." It is meant to help us label values that are exceptional with respect to the bulk of the data. There are several techniques for identifying potential outliers, and the most common of these is the "1.5 IQR rule". Some books use different definitions for outliers, but it's important not to get hung up on a particular definition. This is not a mathematically defined quantity and different books will differ.

• n-1 in the standard deviation formula: For your students, you need to tell them that we can't adequately explain why it's n-1 and not n until later. Take it as a definition, and the reasons will be explained when we talk about "unbiasedness."

• There are certain "reserved letters" in Statistics that (almost) always represent the same concepts. For example, lower case n is always sample size, while upper case N is population size. X (upper case) is always a random variable. Lower case s is always the sample standard deviation. Lower case r is always sample correlation. You'll find more, and it's worth pointing these out to your students when they arise.

Student Misconceptions and Confusions

• There are many "fuzzy" terms in Statistics. For example:
  • "Symmetric" sample distributions are rarely precisely symmetric. Thus there is room for reasonable people to disagree about whether a distribution is symmetric or slightly skewed in a certain direction.
  • "Spread" is another term that is used colloquially and refers to the variability of the observations in the data. However, students will often misinterpret “spread” as the variability of the frequencies instead.
• Be aware that students will have misconceptions based on a clash between their informal definitions and statisticians' formal definitions. Here are some difficult words for which you should make sure the students know what a statistician means: random, independent, expected, "on average", normal.

• There are (at least) two types of means: sample means and population means. Some books are careful about calling the sample mean the sample mean, and others just call it a mean. Later it will be very important to distinguish sample and population means.

• Students will often report a range or inter-quartile range as some number to some other number. Remind students that these measures should instead be reported as the difference between those numbers, which is just one number.

• Students will be tempted to remove outliers. Don't let them do this. Outliers should be investigated. Sometimes, the story is the outlier.


• Link to applet with slider for bin width