• Exploratory Data Analysis
• Comparing Groups
- Earlier we did informal examinations to determine whether a
distribution was Normal. The Chi-square procedure provides a
formal test for whether a distribution of categorical variables has a
specific shape. There are better ways of doing this, but by
treating the bins of a histogram as categories, this could provide a
means for testing whether a histogram is Normal.
- The chi-square test provides a mean for testing whether two
categorical variables are associated. Later you will see that we can
test whether two numerical variables are associated by testing whether
the slope in a linear regression is 0.
- The chi-square statistic's sampling distribution is only
approximated by the chi-square distribution, and this approximation
might not always be too strong (particularly for small sample
sizes.) In some cases, "exact" tests, such as Fisher's Exact
Test, can be used. These are procedures -- deemed impractical
before the advent of fast computing--that provide the exact p-value,
regardless of sample size.
- The chi-square tests has extensions beyond testing a
particular value of a parameter, unlike previous hypotheses tests.
Still, students should be in the habit of writing null and alternative
hypotheses and checking assumptions.
- As noted in the Teaching Tips, note the connection between
the goodness-of-fit test and the one-proportion z-test and between the
test of homogeneity and the two-proportion z-test.
- The chi-square distribution with 1 degree of freedom is the
distribution you would get if you took a standard normal random
variable and squared it. Imagine this experiment: you want to see if
eye color (dark/light) has a relationship to whether students wear
corrective lenses. You can do a chi-square test of independence and you
will get a chi-square statistic. You can also do a z-test to compare
two proportions (using pooled proportions): is the proportion of
light-eyed students with corrective lenses the same as the dark-eyed
with corrective lenses? Take the z-statistic and square it -- you get
the chi-squared statistic. Both statistics have the same p-values. You
can show algebraically (should you wish) that these two test statistics
are the same, but of course it only works for 2X2 tables. Want to see
- As in the previous two units, remember that the p-value is
just a conditional probability; given the null hypothesis, what is the
probability of getting the observed statistic (or something more