Unit 3: More Two-Variable Relationships

Home | Contact us   
  Main Concepts | Demonstration | Teaching Tips | Data Analysis & Activity | Practice Questions | Connections | Fathom Tutorial | Milestone 


Data Analysis & Activity

The Cost of Land

About Data Analysis Activities:

Our purpose for these activities is to give you "rich" data sets that will provide you with an opportunity to practice some of the things we've talked about. These are not "exercises" in the usual sense, in that we are not grading them or looking for you to do any one particular thing with these data. But we hope that, while you explore, questions occur to you that you pass on to the rest of the class and also hope that you learn something from your explorations!

The Data and possible research questions

For this activity you'll analyze a data set that concerns the value of land. Variables that might contribute the value of a property that are included in the data set include the total acreage, the size of any buildings on the property, and various features of the building (number of rooms, floors, if a garage is attached, etc).

How do you think the total acreage of a property relates to the total value of the property? How is the size of the building on the land related to the total value? Are number of bathrooms that important when estimating a property’s worth?

First, upload this data set into your favorite data analysis software. If you use Fathom (and we recommend it for this exercise), you can do this simply by

(1) making sure you're connected to the internet (which I assume you are if you're reading this!)
(2) starting Fathom
(3) Under the "File" menu, select "Import from URL"
(4) In the dialog box that opens, type the URL that points to the data: http://inspire.stat.ucla.edu/unit_03/land.txt

This dataset consists of 11 variables. The variable named "LandVal" gives the value of the land alone – without consideration of the structure on it. “TotalVal” is the value of the property as a whole. “Acreage” is the size of the lot and the other ten variables describe key features of the buildings on the parcels of land. “Height” is the number of floors, “fl1Area” is the square footage of the first floor, “Rooms”, “Bedrooms”, “FullBath”, “HalfBath”, “Fireplce”, and “Garage” are all exactly what they seem to be.

A possible way to proceed

Here are some things to try and think about:

  1. Make a histogram of LandVal. What can you say about the distribution of the value of these parcels of land? Are there unusual features, and if so can you explain them?
  2. Now add Acreage to your plot and look at the scatterplot of LandVal vs. Acreage. What kind of relationship do you see?
  3. Super-impose the least-squares line on your scatterplot (look under the “Graph” menu at the top of the Fathom window). How good a fit do we get with the linear model? See any problems with it?
  4. Now create a residual plot (also available under the “Graph” menu). What features of this residual plot are troublesome?
  5. Try a transformation on Acreage to see if you can fix the problem(s) you saw with the direct linear model. What happens if you use sqrt(Acreage) to predict land value? Are there any new problems that now come to light in the residual plot? Any problems from the previous residual plot solved?
  6. Play with other possible transformations, keeping in mind that you can transform LandVal as well as Acreage. What transformation seems to fit best? Why?
  7. Given this model, what would you expect to pay for a plot of land 3.2 acres in size? What about a plot 0.8 acres? 9 acres? How confident are you in these estimates?
  8. Can you think of any other way to predict the value of a parcel of land given only the acreage information? Given some of the other available information found in the dataset? Explain.