As I progressed on
the Edward W. Frees book, these three days were about covering Regression over
Multiple Explanatory Variables. Again, this was no different as more or less
the same concepts applied to multiple liner regression as well. However, my
infatuation with correlations and scatter plots took a hard hit as I learned
how deceiving looks can be.
As the book explains this by way of a dataset that
lists prices of 37 Refrigerators along with their details and features. The
regression attempts to fit Refrigerator Prices to the following explanatory
variables:
- ECOST: Energy Cost;
- RSIZE: Refrigerator Compartment Size;
- FSIZE: Freezer Compartment Size;
- SHELVES: Number of Shelves;
- FEATURES: Number of Features.
![]() |
Scatter Plot Matrix |
The regression equation came out like this:
Coefficients
|
Standard Error
|
t Stat
|
|
Intercept
|
(797.808)
|
271.409
|
(2.940)
|
ECOST
|
(6.958)
|
2.275
|
(3.058)
|
RSIZE
|
76.497
|
19.442
|
3.935
|
FSIZE
|
137.381
|
23.763
|
5.781
|
SHELVES
|
37.937
|
9.886
|
3.837
|
FEATURES
|
23.764
|
4.512
|
5.267
|
The coefficient for ECOST is of particular importance with its negative sign. It makes sense as higher energy cost would result in lesser demand for the refrigerator as lower Energy Costs increase consumer surplus and thus consumers would indeed would want to pay more for it. However, simple correlations disagree:
![]() |
Correlation Plot using the corrplot package on R |
Data implies a positive 52% correlation between Energy
Cost and Prices.
The book solves this by introducing the concept of “Added
Variable Plot”.
![]() |
Added Variable Plot: Regressions of PRICES and ECOST against 4 explanatory variables |
The plot attempts to plot the residuals of regressions
making PRICES as the Explanatory Variable in the first regression and the ECOST
in the second regression, with both regressions having the four remaining explanatory
variables i.e. RSIZE, FSIZE, SHELVES and FEATURES. The correlation of the two
sets of residuals is the true correlation after controlling for the affect of
the explanatory variables, which in this case worked out to be -0.48, having
almost the same magnitude in the opposite direction. Thus, it is possible
that the positive relationship between PRICE and ECOST is due not to a causal
relationship but rather to one or more additional variables that cause both
variables to be large (exactly quoting the book).
This case study was an enlightening experience as to how regression coefficients and correlation coefficients can depict different pictures altogether. However, so far there is much to be learned as I move towards categorical variables and explanatory variable transformations.
No comments:
Post a Comment