Unit 12: Comparing Two Populations Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone
 Connections • Exploratory Data Analysis As mentioned in the previous units, confidence intervals and hypothesis tests help students use data to just answer that simple investigative question that they may have developed about a particular population. This should reference back to the first unit of the course. In order to evaluate whether assumptions hold, need to look at shape, center, spread of sample distribution. • Comparing Groups Comparing groups has been a theme of the course since the first week. The methods presented in this chapter will finally help students to more formally answer the questions they may have asked in Unit 1: “Is the observed difference between two groups due to chance?” or “Is the observed difference between groups large enough to be real?” • Inference We can now make inference about the difference between two populations using two samples from populations. Just as the previous units discussed, it is important to reiterate that we just need one sample from each population to make inference. Again, two-sample intervals and tests still rely on sampling distributions and the Central Limit Theorem (if assumptions are met), so it is important to show students how these concepts all relate. If the Central Limit Theorem applies, students will need to recall normal models and z-scores. Remember: their samples may not be normal, but the sampling distribution of the differences between their sample means or proportions is normal under Central Limit Theorem. It is very important to draw students back to informal inference done earlier using simulations; show them the same example using both methods so they can see that both achieve same goal; one is just a shortcut/approximation that might be faster to do. If students can understand the intuition behind simulation-based inference, then formal hypothesis testing is just a shortcut we can use when assumptions are met that does the same thing with less work. P-values are just how likely we are to get the observed statistic if we assume a certain model/hypothesis is true; this is the case whether we use the normal model (and look at a shaded region beyond the observed value) or use a model built from many simulations (and count the observations beyond the observed value). • Models In this course, we assume a sampling distribution follows a model with some hypothesized mean. When dealing with two samples, we use the sampling distribution of the difference between sample statistics from two populations and the hypothesized mean is typically zero. But no matter what, we assume some “chance” model that is plausible for our estimate. It is plausible that our estimate comes from this model, but how plausible given the mean of that model? If it’s not very plausible, then we should look for a new model. • Probability The p-value is just a conditional probability; given the null hypothesis, what is the probability of getting the observed statistic (or something more extreme)?