Data Analysis
& Activity
Activity 1
Why do we need the tdistribution? This short activity uses
simulations on the TI83 calculator to see why we use another
distribution (t instead of z) when we are performing inference on a
sample mean and the standard deviation is unknown (as is always the
case in practice). It could also be done on the TI89 calculator or on
Fathom, though the instructions here are particular to the TI83.
We will first simulate a sample of three women's heights, then
we will compute and standardize the sample mean assuming the standard
deviation is known. The following line simulates the sample of heights,
taking the mean to be 65 inches and the standard deviation is 2.8
inches.
randNorm(65, 2.8, 3)
"randNorm" is found on the TI83 under the math>prb menu. You will
need to scroll right after simulating the sample to see the entire list
of three. If you press "enter" again and again, the same command is
executed repeatedly, so you can easily simulate many samples of three.
To compute and standardize the sample mean, enter the following two
commands, separated by a colon. The colon is the alpha function of the
decimal key.
randnorm(65, 2.8, 3)>L1 : (mean(L1)65)/(2.8/root(3))
The ">" represents the "store as" function, located over the on
button. mean( ) is found under the 2ndlistmath menu.
If you enter this command and then press enter several times, you will
be simulating standardized sample means. We know from theory that the
distribution of this statistic should be the standard normal, so you
should be seeing numbers that are mostly between 2 and 2. It would be
very unusual to see a value larger than 3 in magnitude.
Now we will repeat the simulation using the sample standard deviation
instead of 2.8.
randNorm(65, 2.8, 3)>L1:(mean(L1)65)/(stdev(L1)/root(3))
The command stdev( ) is found under the 2ndlistmath menu.
If you enter this command and then press enter several times, you will
be simulating standardized sample means using the sample standard
deviation. You should see a difference between this simulation's
results and the last one. Values larger than 3 in magnitude are not
nearly so uncommon as before. This is the reason we have the
tdistribution. It has heavier tails than the standard normal and takes
into account the extra variability that comes from not knowing sigma.
If you do this in the classroom, it is interesting to have students say
out loud any values they get that exceed 3 in magnitude. It will happen
very infrequently with the first simulation, but quite frequently with
the second.
It is interesting to stop when you get a very large value of the
statistic and then go look at the contents of list L1. Generally, you
will see three numbers that are all somewhat far from the mean of 65
inches, but the three numbers will be close to one another, producing a
small sample standard deviation. The numerator of the statistic is
largish, the denominator is small, hence the large tstatistic. Such an
occurrence in real life with a real sample would be misleadingyou
would see little variability and would have a lot of confidence in your
results, but in fact they would (unknown to you) be unusually far from
the mean. The tdistribution quantifies how often such atypical samples
occur.
You can also repeat the second simulation with a larger sample size
than three (say, 10) using the following command:
randNorm(65, 2.8, 10)>L1:(mean(L1)65)/(stdev(L1)/root(10))
This time the large values are relatively unlikely, because the sample
standard deviation has less variability in it (and behaves more like
the constant sigma) when the sample size is large. tdistributions with
large degrees of freedom look more like the standard normal
distribution.
Activity 2
This time you're going to collect and analyze your own data to
explore the validity of an urban myth. Well, perhaps this particular
"myth" hasn't achieved sufficient fame to be called an Urban Myth, but
we'll explore it anyways.
It is well accepted  and I'm told there are solid
theoretical reasons that a physicist could explain  that a coin
tossed into the air so that it flips has a 50% chance of landing heads.
It has something to do with angular momentum and, um, well, it's a well
accepted fact. What is not so well accepted is what happens if you spin
a coin on a hard surface and wait for it to fall. Does it still have
50% chance of landing heads?
Your goal is to determine whether spinning a coin and waiting
for it to land still produces a 50% probability of landing heads. Here
are some guidelines:
a) Describe the test you're going to use. What are the
assumptions behind it? Do they seem plausibly satisfied by your
experiment?
b) Decide ahead of time how many spins you're going to do. No
fair spinning until you reach a conclusion you like!
c) Do the experiment; collect the data; summarize the data for
us; and carry out a test.
What do you conclude?
You might want to use the chat feature or bulletin boards (if
available) to
share your results and increase your sample size.
