NMSA407 Linear Regression: Tutorial

Transformation of response: ANOVA with log-transformed response to get normality and homoscedasticity

Data Houses1987




Introduction

Load used data and calculate basic summaries

data(Houses1987, package = "mffSM")
head(Houses1987)
##   price ground bed bath floor garage airco gas fbed fbath ffloor fgarage fairco fgas
## 1 42000    544   3    1     2      1     0   0    3     1      2       1     No   No
## 2 38500    372   2    1     1      0     0   0  <=2     1      1       0     No   No
## 3 49500    285   3    1     1      0     0   0    3     1      1       0     No   No
## 4 60500    619   3    1     2      0     0   0    3     1      2       0     No   No
## 5 61000    592   2    1     1      0     0   0  <=2     1      1       0     No   No
## 6 66000    387   3    1     1      0     1   0    3     1      1       0    Yes   No
dim(Houses1987)
## [1] 546  14
summary(Houses1987)
##      price            ground            bed             bath           floor           garage      
##  Min.   : 25000   Min.   : 153.0   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :0.0000  
##  1st Qu.: 49125   1st Qu.: 335.0   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000  
##  Median : 62000   Median : 428.0   Median :3.000   Median :1.000   Median :2.000   Median :0.0000  
##  Mean   : 68122   Mean   : 479.1   Mean   :2.965   Mean   :1.286   Mean   :1.808   Mean   :0.6923  
##  3rd Qu.: 82000   3rd Qu.: 592.0   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :190000   Max.   :1507.0   Max.   :6.000   Max.   :4.000   Max.   :4.000   Max.   :3.0000  
##      airco             gas           fbed     fbath     ffloor    fgarage   fairco     fgas    
##  Min.   :0.0000   Min.   :0.00000   <=2:138   1  :402   1  :227   0  :300   No :373   No :521  
##  1st Qu.:0.0000   1st Qu.:0.00000   3  :301   2  :133   2  :238   1  :126   Yes:173   Yes: 25  
##  Median :0.0000   Median :0.00000   4  : 95   >=3: 11   >=3: 81   >=2:120                      
##  Mean   :0.3168   Mean   :0.04579   >=5: 12                                                    
##  3rd Qu.:1.0000   3rd Qu.:0.00000                                                              
##  Max.   :1.0000   Max.   :1.00000




Dependence of ground on fbed

Boxplots with original response

plot(ground ~ fbed, col = rainbow_hcl(4), data = Houses1987, xlab = "Number of bedrooms", ylab = "Ground size")

plot of chunk fig-CheckModelAssumpt-06-01

One-way ANOVA linear model with the original response

m1 <- lm(ground ~ fbed, data = Houses1987)

Basic residual plots

library("mffSM")
plotLM(m1)

plot of chunk fig-CheckModelAssumpt-06-02

Normal QQ plot of standardized residuals

plot(m1, which = 2, pch = 21, col = "blue4", bg = "skyblue")

plot of chunk fig-CheckModelAssumpt-06-03




Log-transformation of response

Houses1987 <- transform(Houses1987, lground = log(ground))

Plot with logarithmic \(y\)-axis

plot(lground ~ fbed, col = rainbow_hcl(4), data = Houses1987, xlab = "Number of bedrooms", ylab = "Ground size", yaxt = "n")
yaxis <- c(150, 250, 500, 1000, 1500)
axis(2, at = log(yaxis), labels = yaxis)

plot of chunk fig-CheckModelAssumpt-06-04

Plot again

plot(lground ~ fbed, col = rainbow_hcl(4), data = Houses1987, xlab = "Number of bedrooms", ylab = "Log(ground size)")

plot of chunk fig-CheckModelAssumpt-06-05

One-way ANOVA linear model with log-transformed response

m2 <- lm(lground ~ fbed, data = Houses1987)

Basic residual plots

plotLM(m2)

plot of chunk fig-CheckModelAssumpt-06-06

Normal QQ plot of standardized residuals

plot(m2, which = 2, pch = 21, col = "blue4", bg = "skyblue")

plot of chunk fig-CheckModelAssumpt-06-07

Tukey's pairwise comparisons

a2 <- aov(lground ~ fbed, data = Houses1987)
ta2 <- TukeyHSD(a2)

Estimates of differences between expected logs of ground

print(ta2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lground ~ fbed, data = Houses1987)
## 
## $fbed
##                diff          lwr       upr     p adj
## 3-<=2    0.10731772  0.002999519 0.2116359 0.0410284
## 4-<=2    0.19389721  0.058619197 0.3291752 0.0013869
## >=5-<=2  0.15304154 -0.152356372 0.4584395 0.5687998
## 4-3      0.08657950 -0.032833927 0.2059929 0.2429299
## >=5-3    0.04572383 -0.252985582 0.3444332 0.9791733
## >=5-4   -0.04085567 -0.351733724 0.2700224 0.9866171
print(ta2, digits = 4)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lground ~ fbed, data = Houses1987)
## 
## $fbed
##             diff      lwr    upr  p adj
## 3-<=2    0.10732  0.00300 0.2116 0.0410
## 4-<=2    0.19390  0.05862 0.3292 0.0014
## >=5-<=2  0.15304 -0.15236 0.4584 0.5688
## 4-3      0.08658 -0.03283 0.2060 0.2429
## >=5-3    0.04572 -0.25299 0.3444 0.9792
## >=5-4   -0.04086 -0.35173 0.2700 0.9866

Estimates of ratios of expected ground

ta2$fbed[, c("diff", "lwr", "upr")] <- exp(ta2$fbed[, c("diff", "lwr", "upr")])
colnames(ta2$fbed)[1] <- "ratio"
print(ta2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lground ~ fbed, data = Houses1987)
## 
## $fbed
##             ratio       lwr      upr     p adj
## 3-<=2   1.1132879 1.0030040 1.235698 0.0410284
## 4-<=2   1.2139715 1.0603714 1.389821 0.0013869
## >=5-<=2 1.1653734 0.8586822 1.581604 0.5687998
## 4-3     1.0904380 0.9676993 1.228745 0.2429299
## >=5-3   1.0467853 0.7764791 1.411190 0.9791733
## >=5-4   0.9599677 0.7034674 1.309994 0.9866171
print(ta2, digits = 4)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lground ~ fbed, data = Houses1987)
## 
## $fbed
##         ratio    lwr   upr  p adj
## 3-<=2   1.113 1.0030 1.236 0.0410
## 4-<=2   1.214 1.0604 1.390 0.0014
## >=5-<=2 1.165 0.8587 1.582 0.5688
## 4-3     1.090 0.9677 1.229 0.2429
## >=5-3   1.047 0.7765 1.411 0.9792
## >=5-4   0.960 0.7035 1.310 0.9866