Unit 2: Two-Variable Relationships

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 


Data Analysis & Activity

Activity 1

The applets in this activity are designed to help you understand technical aspects of regression. Feel free to explore. The instructions we give are meant to guide your explorations, but of course you might make note of how you would use these in your own class.

1. This applet helps in understanding two (at least) fundamental aspects of regression:
a) The regression line can be strongly influenced by "unusual" observations.
b) Visualizing the residuals plot.

Do this:
I. Place 4 or five points in a mostly linear fashion. Notice how the residual plot (to the right) is clustered about the line y = 0 and that the residual plot displays the vertical distance that each point falls from the regression line.
II. Now place a point off to the extreme right or left corner (upper or lower). How does the regression line react? What does the residual plot show?
III. Now place a point near the top or bottom of the plot, but towards the middle of the points. Does this affect the line as much?
IV. Clear the points and start all over. This time, place your points to follow a quadratic relation or some other curve. Note that regression line still exists (and points to the general trend of the data), and pay attention to how the residuals look.

2. This applet shows the correlation coefficient and also helps you compare your intuition about "best fit" lines with the actual best-fit. You can also explore the role of influential points. Click on the link below and then select: Correlation and Regression.

Do this:
I. Add points so that the correlation coefficient is as close to 1 as you can get it.
II. Start again; add points so that the correlation is close to 0.
III. Add points to roughly follow a linear trend so that the correlation is somewhere around 0.5. Display the least-squares line. Choose one point near the extremes of the plot and move it up and down. How does the line react? Compare this to what happens when you move a point more in the middle of the plot.
IV. Start again: create an approximately linear relationship with about 10 points. Now add another point far away from the others but still following the linear trend. What happens as you move this distant point up and down?

Activity 2

We will look at a data set that discusses an issue sure to be near and dear to all of you: standardized testing and what leads to higher test scores. The data presented is aggregate data collected from each of the 50 states and includes information concerning the average cost per pupil, the student:teacher ratio, the estimated average teacher salary, and the percent of students who took the SAT exam for the 1994-1995 school year. What you hope to predict with all of this information is the state’s average Verbal, Math, and combined SAT scores.

Without looking at the data, which variable do you think will be of most use in predicting SAT scores? Why?

Download the data set from inspire.stat.ucla.edu/unit_02/SATdata.txt (note that there is an underscore _ between the words "unit" and "02").

Your primary goal is to understand the relation between SAT scores and the various cost variables. The questions below are meant to guide your analysis.

(1) What is the relationship between the amount of money spent per pupil and average total SAT score? What’s the correlation? The scatter plot? What is the coefficient in the linear regression model? What does this mean about how SAT scores can be expected to change when more money is spent per pupil?

(2) What do you find when you examine the relationship between the student:teacher ratio and total SAT score?

(3) Of the variables included in the data set, which is most highly correlated with total SAT score? Does this variable seem to have a higher effect on Math or Verbal scores, or is the effect about equal?

(4) Examine the linear relationship between total SAT score and the variable you found with which it's most correlated. Suggest explanations for the shape of the relationship.

(5) Suppose your state decides to focus on raising their average SAT score. What’s the best thing to do to try to raise a state’s average SAT score, according to this data?

(6) Remember that this is aggregate state-level data. You are not able to see what is happening with individual schools or even at individual districts. How does that effect our ability to understand SAT score variability? Is there any downside to having only state-level data? 

Discuss your answers to these questions on the Discussion Board. We will re-visit aggregated SAT data in demonstration of Unit 3.