Unit 13: Chi-square Tests

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 


Data Analysis & Activity

This time we offer two activities that you can do in your class. We thought that this was already a fairly "hands-on" unit, and you'll get your activities in the milestone and data collection sections. So, for your students...

Activity 1: The Token Activity

There is a good opportunity here for students to invent their own statistic, which is a good learning experience. A classroom activity you might do is this. Bring in a paper bag containing, say, 7 blue tokens, 2 red tokens, and one green token. Write on the blackboard: "H0: This bag contains 20% blue tokens, 30% red tokens, and 50% green tokens". That's your claim to the students. Now draw (with replacement, simulating a large population) 10 chips from the bag and tally their colors on the board. Ask the students if they believe the claim. "No? Why not?" They will (hopefully) say that the data don't look like they came from that distribution. "How far off are the data?" you ask.

To answer the question, a statistic is needed: some number that summarizes the discrepancy between the observed data and the claimed distribution. Have the students invent such a number. (By "number" we mean a process for computing such a number.)

A common student solution is the sum of the absolute values of the differences between the observed percents and the claimed percents. For example, if your claim for blue, red, green was 0.2, 0.3, 0.5, and you observed 8, 1, 1, then the value of the statistic would be |0.2-0.8|+|0.3-0.1|+|0.5-0.1|=1.2. Now you ask the students, "Okay, we got 1.2 as our discrepancy measure. Is that large? Or is that the sort of number we might easily have gotten by chance?" They may say it's large, but they will not be able to justify it easily.

Hopefully they will suggest a simulation, buf if not, you can do so. Each student gets a paper bag and constructs his or her own population of tokens in the bag. It is important for students to put the tokens in themselves so they'll be forced to think about what goes in: the claimed distribution (correct) or the observed distribution (incorrect). They are to draw 10 at random with replacement, just as you did, and compute the value of the test statistic for their data. They probably will have time to repeat this several times. Then collectively they should plot all their results on a histogram on the blackboard. Now the students can assess how unusual the actual 1.2 is. It should be pretty unusual, and therefore you reject H0. After this activity is completed (it will take a whole class period) students may be assigned as homework to read about the chi-square goodness-of-fit test and come to class prepared to say how it is different from the class activity. (The only difference is the computation of test statistic.)

This introduction to the chi-square statistic is less confusing to students than diving into the chi-square right away because they see the big picture easily and they see that the point is to estimate the magnitude of the discrepancy between observed categorical data and a claimed distribution. After that, the actual chi-square statisic is much easier for students to grasp.

If you do this lesson with your class, you should not let them think that their invented statistic is going to be on "the test". They should be told that the point was not the particular statistic they came up with, but the process of measuring discrepancy and using simulation to quantify how probable such a discrepancy is under the null. From now on, they should use the chi-square statistic.

Activity 2: The M&M's, or Possibly Skittles, Activity

Are M&M's produced with the same color frequencies as the Mars website claims? Hand out regular sized bags of regularly colored M&Ms. (Don't use the large or king-sized bags -- too much to count. And don't use the small bags -- not enough data.) Ask your class to compute the chi-squared statistic for their bag. You can then plot these statistics on the board. The distribution of these statistics, however, will NOT look like a chi-squared distribution. Why not? Because M&M's do NOT follow the claimed distribution. (The horror, the horror!) And so the sampling distribution of the chi-square statistic in this case is NOT a chi-square distribution!

You might also try (assuming you live in a district with good dentists) the same experiment with Skittles. Skittles, I am happy to report, do follow the advertised distribution of colors (uniform). And so if you plot your students' individual chi-square statistics you will see a chi-square distribution. also, about 5% of your class should reject the null hypothesis, even though the rest don't. A nice lesson in Type I error!

M&Ms web site which contains the distribution of colors.  Last time we checked it was 13% brown, 14% yellow, 13% red, 24% blue, 20% orange and 16% green.

Activity 3: 

This activity comes from the DeVeaux and Velleman book. Ask your class (this is the data collection part) to select one of the numbers 1, 2, 3 or 4. Write them on the board and ask the class to select at random. The claim is that most collections of people will select these in these proportions: 5%, 10%, 75%, 10%.

Here's the analysis part: is your class in agreement with the DeVeaux/Velleman claim?

Post your data and conclusion on the Discussion Board, if your instructor is using a discussion forum.