a. We first consider studying the relationship between rm (average number of rooms) and medv (house price). Please make a scatter plot of medv against rm and calculate their sample correlation using Excel. Please comment.
MIS 301, Statistical Analysis for Business, Project
Boston all.xlsx contains housing data for 506 census tracts of Boston from the 1970 census.
There are 13 variables and 506 observations:
1. crim: per capita crime rate by town.
2. zn: proportion of residential land zoned for lots over 25,000 square feet.
3. indus: proportion of non-retail business acres per town.
4. chas: Charles River dummy variable (1, if tract bounds river; 0 otherwise).
5. nox: nitrogen oxides concentration (parts per 10 million).
6. rm: average number of rooms per dwelling.
7. age: proportion of owner-occupied units built prior to 1940.
8. dis: weighted mean of distances to five Boston employment centers.
9. rad: index of accessibility to radial highways.
10. tax: full-value property-tax rate per $10,000.
11. ptratio: pupil-teacher ratio by town.
12. lstat: lower status of the population (percent).
13. medv: median value of owner-occupied homes in $1000s.
Our goal is to interpret the house price with all available information.
a. We first consider studying the relationship between rm (average number of rooms) and medv
(house price). Please make a scatter plot of medv against rm and calculate their sample correlation
using Excel. Please comment. (15%).
b. We regress medv (as y) on rm (as x). Please include the Excel output and write down the
estimated regression model. (10%).
c. Secondly, we investigate the relationship between tax (property tax) and medv (house price).
Please make a scatter plot of medv against tax and calculate their sample correlation using Excel.
Please comment. (15%).