Unit 13: Chi-square Tests
|Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone|
Data Analysis & Activity
This time we offer two activities that you can do in your
class. We thought that this was already a fairly "hands-on" unit, and
you'll get your activities in the milestone and data collection
sections. So, for your students...
Activity 1: The Token Activity
There is a good opportunity here for students to invent their own statistic, which is a good learning experience. A classroom activity you might do is this. Bring in a paper bag containing, say, 7 blue tokens, 2 red tokens, and one green token. Write on the blackboard: "H0: This bag contains 20% blue tokens, 30% red tokens, and 50% green tokens". That's your claim to the students. Now draw (with replacement, simulating a large population) 10 chips from the bag and tally their colors on the board. Ask the students if they believe the claim. "No? Why not?" They will (hopefully) say that the data don't look like they came from that distribution. "How far off are the data?" you ask.
To answer the question, a statistic is needed: some number that summarizes the discrepancy between the observed data and the claimed distribution. Have the students invent such a number. (By "number" we mean a process for computing such a number.)
A common student solution is the sum of the absolute values of the differences between the observed percents and the claimed percents. For example, if your claim for blue, red, green was 0.2, 0.3, 0.5, and you observed 8, 1, 1, then the value of the statistic would be |0.2-0.8|+|0.3-0.1|+|0.5-0.1|=1.2. Now you ask the students, "Okay, we got 1.2 as our discrepancy measure. Is that large? Or is that the sort of number we might easily have gotten by chance?" They may say it's large, but they will not be able to justify it easily.
Hopefully they will suggest a simulation, buf if not, you can do so. Each student gets a paper bag and constructs his or her own population of tokens in the bag. It is important for students to put the tokens in themselves so they'll be forced to think about what goes in: the claimed distribution (correct) or the observed distribution (incorrect). They are to draw 10 at random with replacement, just as you did, and compute the value of the test statistic for their data. They probably will have time to repeat this several times. Then collectively they should plot all their results on a histogram on the blackboard. Now the students can assess how unusual the actual 1.2 is. It should be pretty unusual, and therefore you reject H0. After this activity is completed (it will take a whole class period) students may be assigned as homework to read about the chi-square goodness-of-fit test and come to class prepared to say how it is different from the class activity. (The only difference is the computation of test statistic.)
This introduction to the chi-square statistic is less confusing to students than diving into the chi-square right away because they see the big picture easily and they see that the point is to estimate the magnitude of the discrepancy between observed categorical data and a claimed distribution. After that, the actual chi-square statisic is much easier for students to grasp.
If you do this lesson with your class, you should not let them
think that their invented statistic is going to be on "the test". They
should be told that the point was not the particular statistic they
came up with, but the process of measuring discrepancy and using
simulation to quantify how probable such a discrepancy is under the
null. From now on, they should use the chi-square statistic.
Activity 2: The M&M's, or Possibly Skittles, Activity
Are M&M's produced with the same color frequencies as the Mars website claims? Hand out regular sized bags of regularly colored M&Ms. (Don't use the large or king-sized bags -- too much to count. And don't use the small bags -- not enough data.) Ask your class to compute the chi-squared statistic for their bag. You can then plot these statistics on the board. The distribution of these statistics, however, will NOT look like a chi-squared distribution. Why not? Because M&M's do NOT follow the claimed distribution. (The horror, the horror!) And so the sampling distribution of the chi-square statistic in this case is NOT a chi-square distribution!
You might also try (assuming you live in a district with good
dentists) the same experiment with Skittles. Skittles, I am happy to
report, do follow the advertised distribution of colors (uniform). And
so if you plot your students' individual chi-square statistics you will
see a chi-square distribution. also, about 5% of your class should
reject the null hypothesis, even though the rest don't. A nice lesson
in Type I error!
This activity comes from the DeVeaux and Velleman book. Ask your class (this is the data collection part) to select one of the numbers 1, 2, 3 or 4. Write them on the board and ask the class to select at random. The claim is that most collections of people will select these in these proportions: 5%, 10%, 75%, 10%.
Here's the analysis part: is your class in agreement with the DeVeaux/Velleman claim?
Post your data and conclusion on the Discussion Board, if your instructor is using a discussion forum.