Unit 1: Exploring Data
|Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone|
Teaching Tips• Organizational tools used depend on the question and the nature of the data. For example, with numerical summaries:
• Clarity in numerical and graphical summaries avoids misconceptions. Mechanics include titles, axes labels, names of variables being summarized and observational units being used.
• Insist that your students put comparative graphs on the same scale, or they may make erroneous comparisons.
• Graphs do not speak for themselves. Students (and you) should never provide a graph without an interpretation.
• Interpretation and communication is the focus. Students need to know what to ask for from the software, but not how the software creates it.
• Be precise with descriptions of data: "Men make more money than women" is too vague to be meaningful. Some women make more than some men, and some women make a LOT more than 95% of the men. What might be true is that the mean/median income for men is higher than the mean/median for women.
• One of the conceptual shifts we're expecting from the students is to stop thinking in terms of individuals and exceptional values, and instead think in terms of groups and general trends. For this reason, when comparing two groups, students should focus on the centers (the "typical"), the spread (how much variety within a group) and the general shape (are there exceptional responses, or differences in the overall shapes from the two groups?).
• When examining the shape of a distribution, we look for the general pattern, but also exceptions from the pattern. Exceptions include outliers, potential outliers, gaps, or anything that piques interest. For example, often self-reported weights come in multiples of five and this can sometimes be detected in a graphic. Also, consider the effect of various bin widths on shape when using histograms.
• Identifying outliers is tricky and confusing. There is no technical definition of an outlier. It is instead a "term of art." It is meant to help us label values that are exceptional with respect to the bulk of the data. There are several techniques for identifying potential outliers, and the most common of these is the "1.5 IQR rule". Some books use different definitions for outliers, but it's important not to get hung up on a particular definition. This is not a mathematically defined quantity and different books will differ.
• n-1 in the standard deviation formula: For your students, you need to tell them that we can't adequately explain why it's n-1 and not n until later. Take it as a definition, and the reasons will be explained when we talk about "unbiasedness."
• There are certain "reserved letters" in Statistics that (almost) always represent the same concepts. For example, lower case n is always sample size, while upper case N is population size. X (upper case) is always a random variable. Lower case s is always the sample standard deviation. Lower case r is always sample correlation. You'll find more, and it's worth pointing these out to your students when they arise.
Student Misconceptions and Confusions• There are many "fuzzy" terms in Statistics. For example:
• There are (at least) two types of means: sample means and population means. Some books are careful about calling the sample mean the sample mean, and others just call it a mean. Later it will be very important to distinguish sample and population means.
• Students will often report a range or inter-quartile range as some number to some other number. Remind students that these measures should instead be reported as the difference between those numbers, which is just one number.
• Students will be tempted to remove outliers. Don't let them do this. Outliers should be investigated. Sometimes, the story is the outlier.
Resources• Link to applet with slider for bin width