NMSA407 Linear Regression: Tutorial

ANOVA Tables of Type I, II and III

Data Cars2004nh




Introduction

Load used data and calculate basic summaries

data(Cars2004nh, package = "mffSM")
head(Cars2004nh)
##                         vname type drive price.retail price.dealer   price cons.city cons.highway
## 1          Chevrolet.Aveo.4dr    1     1        11690        10965 11327.5       8.4          6.9
## 2 Chevrolet.Aveo.LS.4dr.hatch    1     1        12585        11802 12193.5       8.4          6.9
## 3      Chevrolet.Cavalier.2dr    1     1        14610        13697 14153.5       9.0          6.4
## 4      Chevrolet.Cavalier.4dr    1     1        14810        13884 14347.0       9.0          6.4
## 5   Chevrolet.Cavalier.LS.2dr    1     1        16385        15357 15871.0       9.0          6.4
## 6           Dodge.Neon.SE.4dr    1     1        13670        12849 13259.5       8.1          6.5
##   consumption engine.size ncylinder horsepower weight      iweight  lweight wheel.base length width
## 1        7.65         1.6         4        103   1075 0.0009302326 6.980076        249    424   168
## 2        7.65         1.6         4        103   1065 0.0009389671 6.970730        249    389   168
## 3        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 4        7.70         2.2         4        140   1214 0.0008237232 7.101676        264    465   173
## 5        7.70         2.2         4        140   1187 0.0008424600 7.079184        264    465   175
## 6        7.30         2.0         4        132   1171 0.0008539710 7.065613        267    442   170
##      ftype fdrive
## 1 personal  front
## 2 personal  front
## 3 personal  front
## 4 personal  front
## 5 personal  front
## 6 personal  front
dim(Cars2004nh)
## [1] 425  20
summary(Cars2004nh)
##     vname                type           drive        price.retail     price.dealer   
##  Length:425         Min.   :1.000   Min.   :1.000   Min.   : 10280   Min.   :  9875  
##  Class :character   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 20370   1st Qu.: 18973  
##  Mode  :character   Median :1.000   Median :1.000   Median : 27905   Median : 25672  
##                     Mean   :2.219   Mean   :1.692   Mean   : 32866   Mean   : 30096  
##                     3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.: 39235   3rd Qu.: 35777  
##                     Max.   :6.000   Max.   :3.000   Max.   :192465   Max.   :173560  
##                                                                                      
##      price          cons.city      cons.highway     consumption     engine.size      ncylinder     
##  Min.   : 10078   Min.   : 6.20   Min.   : 5.100   Min.   : 5.65   Min.   :1.300   Min.   :-1.000  
##  1st Qu.: 19600   1st Qu.:11.20   1st Qu.: 8.100   1st Qu.: 9.65   1st Qu.:2.400   1st Qu.: 4.000  
##  Median : 26656   Median :12.40   Median : 9.000   Median :10.70   Median :3.000   Median : 6.000  
##  Mean   : 31481   Mean   :12.36   Mean   : 9.142   Mean   :10.75   Mean   :3.208   Mean   : 5.791  
##  3rd Qu.: 37514   3rd Qu.:13.80   3rd Qu.: 9.800   3rd Qu.:11.65   3rd Qu.:3.900   3rd Qu.: 6.000  
##  Max.   :183012   Max.   :23.50   Max.   :19.600   Max.   :21.55   Max.   :8.300   Max.   :12.000  
##                   NA's   :14      NA's   :14       NA's   :14                                      
##    horsepower        weight        iweight             lweight        wheel.base        length     
##  Min.   :100.0   Min.   : 923   Min.   :0.0003067   Min.   :6.828   Min.   :226.0   Min.   :363.0  
##  1st Qu.:165.0   1st Qu.:1412   1st Qu.:0.0005542   1st Qu.:7.253   1st Qu.:262.0   1st Qu.:450.0  
##  Median :210.0   Median :1577   Median :0.0006341   Median :7.363   Median :272.0   Median :472.0  
##  Mean   :216.8   Mean   :1626   Mean   :0.0006412   Mean   :7.373   Mean   :274.9   Mean   :470.6  
##  3rd Qu.:255.0   3rd Qu.:1804   3rd Qu.:0.0007082   3rd Qu.:7.498   3rd Qu.:284.0   3rd Qu.:490.0  
##  Max.   :500.0   Max.   :3261   Max.   :0.0010834   Max.   :8.090   Max.   :366.0   Max.   :577.0  
##                  NA's   :2      NA's   :2           NA's   :2       NA's   :2       NA's   :26     
##      width            ftype       fdrive   
##  Min.   :163.0   personal:242   front:223  
##  1st Qu.:175.0   wagon   : 30   rear :110  
##  Median :180.0   SUV     : 60   4x4  : 92  
##  Mean   :181.1   pickup  : 24              
##  3rd Qu.:185.0   sport   : 49              
##  Max.   :206.0   minivan : 20              
##  NA's   :28

Complete cases subset used here

To be able to compare a model fitted here with other models where also other covariates will be included, we restrict ourselves to a subset of the dataset where all variables consumption, lweight and engine.size are known.

isComplete <- complete.cases(Cars2004nh[, c("consumption", "lweight", "engine.size")])
sum(!isComplete)
## [1] 16
CarsNow <- subset(Cars2004nh, isComplete, select = c("consumption", "drive", "fdrive", "weight", "lweight", "engine.size"))
dim(CarsNow)
## [1] 409   6
summary(CarsNow)
##   consumption        drive         fdrive        weight        lweight       engine.size   
##  Min.   : 5.65   Min.   :1.000   front:212   Min.   : 923   Min.   :6.828   Min.   :1.300  
##  1st Qu.: 9.65   1st Qu.:1.000   rear :108   1st Qu.:1415   1st Qu.:7.255   1st Qu.:2.400  
##  Median :10.70   Median :1.000   4x4  : 89   Median :1577   Median :7.363   Median :3.000  
##  Mean   :10.75   Mean   :1.699               Mean   :1622   Mean   :7.371   Mean   :3.178  
##  3rd Qu.:11.65   3rd Qu.:2.000               3rd Qu.:1804   3rd Qu.:7.498   3rd Qu.:3.800  
##  Max.   :21.55   Max.   :3.000               Max.   :2903   Max.   :7.973   Max.   :6.000




Dependence of consumption on lweight and fdrive

Scatterplots consumption on lweight by fdrive

par(mfrow = c(2, 2), bty = BTY, mar = c(5, 4, 3, 1) + 0.1)
for (dr in levels(CarsNow[, "fdrive"])){
    plot(consumption ~ lweight, data = subset(CarsNow, fdrive == dr), pch = PCH, col = COL, bg = BGC,
         xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]", main = dr,
         xlim = range(CarsNow[, "lweight"]), ylim = range(CarsNow[, "consumption"]))
}

plot of chunk fig-AdditInter-03-01

Scatterplots consumption on lweight by fdrive in one plot

FCOL <- rainbow_hcl(3)
FCOL2 <- c("red3", "darkgreen", "darkblue")
FPCH <- c(21, 23, 24)
names(FCOL) <- names(FCOL2) <- names(FPCH) <- levels(CarsNow[, "fdrive"])
par(mfrow = c(1, 1), bty = BTY, mar = c(4, 4, 1, 1) + 0.1)
plot(consumption ~ lweight, data = CarsNow, pch = FPCH[fdrive], col = FCOL2[fdrive], bg = FCOL[fdrive],
     xlab = "Log(weight) [log(kg)]", ylab = "Consumption [l/100 km]")
legend(6.9, 21, legend = levels(CarsNow[, "fdrive"]), title = "Drive", pch = FPCH, col = FCOL2, pt.bg = FCOL)

plot of chunk fig-AdditInter-03-02




Series of models with lweight and fdrive as covariates

mInter  <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
mAddit  <- lm(consumption ~ fdrive + lweight,                  data = CarsNow)
mDrive  <- lm(consumption ~ fdrive,                            data = CarsNow)
mWeight <- lm(consumption ~ lweight,                           data = CarsNow)
m0      <- lm(consumption ~ 1,                                 data = CarsNow)

Interaction model

summary(mInter)
## 
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight, 
##     data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4038 -0.6438 -0.1021  0.5672  4.3237 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -52.8047     2.5266 -20.900  < 2e-16 ***
## fdriverear          19.8445     5.1297   3.869 0.000128 ***
## fdrive4x4          -12.5366     4.6506  -2.696 0.007319 ** 
## lweight              8.5716     0.3461  24.763  < 2e-16 ***
## fdriverear:lweight  -2.5890     0.6956  -3.722 0.000226 ***
## fdrive4x4:lweight    1.7837     0.6240   2.858 0.004480 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057 
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16

Additive model

summary(mAddit)
## 
## Call:
## lm(formula = consumption ~ fdrive + lweight, data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4064 -0.6649 -0.1323  0.5747  5.1533 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -52.5605     1.9627 -26.780  < 2e-16 ***
## fdriverear    0.6964     0.1181   5.897 7.83e-09 ***
## fdrive4x4     0.8787     0.1363   6.445 3.29e-10 ***
## lweight       8.5381     0.2688  31.762  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9726 on 405 degrees of freedom
## Multiple R-squared:  0.7937, Adjusted R-squared:  0.7922 
## F-statistic: 519.5 on 3 and 405 DF,  p-value: < 2.2e-16

Model with fdrive only

summary(mDrive)
## 
## Call:
## lm(formula = consumption ~ fdrive, data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0913 -1.2489 -0.0440  0.9587  9.0511 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.7413     0.1247  78.149  < 2e-16 ***
## fdriverear    1.5527     0.2146   7.237 2.32e-12 ***
## fdrive4x4     2.7576     0.2292  12.030  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 406 degrees of freedom
## Multiple R-squared:  0.2799, Adjusted R-squared:  0.2764 
## F-statistic: 78.91 on 2 and 406 DF,  p-value: < 2.2e-16

Model with lweight only

summary(mWeight)
## 
## Call:
## lm(formula = consumption ~ lweight, data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6544 -0.7442 -0.1526  0.5160  5.1616 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -58.2480     1.8941  -30.75   <2e-16 ***
## lweight       9.3606     0.2569   36.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.035 on 407 degrees of freedom
## Multiple R-squared:  0.7654, Adjusted R-squared:  0.7648 
## F-statistic:  1328 on 1 and 407 DF,  p-value: < 2.2e-16

Only intercept model

summary(m0)
## 
## Call:
## lm(formula = consumption ~ 1, data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1013 -1.1013 -0.0513  0.8987 10.7987 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  10.7513     0.1055   101.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.134 on 408 degrees of freedom




Interaction versus additive model

Explicit comparison of the two models

anova(mAddit, mInter)
## Analysis of Variance Table
## 
## Model 1: consumption ~ fdrive + lweight
## Model 2: consumption ~ fdrive + lweight + fdrive:lweight
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    405 383.1                                  
## 2    403 356.4  2    26.702 15.097 4.758e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type I ANOVA table

anova(mInter)
## Analysis of Variance Table
## 
## Response: consumption
##                 Df Sum Sq Mean Sq  F value    Pr(>F)    
## fdrive           2 519.89  259.94  293.935 < 2.2e-16 ***
## lweight          1 954.26  954.26 1079.040 < 2.2e-16 ***
## fdrive:lweight   2  26.70   13.35   15.097 4.758e-07 ***
## Residuals      403 356.40    0.88                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1




Type I ANOVA tables

mInter1 <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
anova(mInter1)
## Analysis of Variance Table
## 
## Response: consumption
##                 Df Sum Sq Mean Sq  F value    Pr(>F)    
## fdrive           2 519.89  259.94  293.935 < 2.2e-16 ***
## lweight          1 954.26  954.26 1079.040 < 2.2e-16 ***
## fdrive:lweight   2  26.70   13.35   15.097 4.758e-07 ***
## Residuals      403 356.40    0.88                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mInter2 <- lm(consumption ~ lweight + fdrive + fdrive:lweight, data = CarsNow)
anova(mInter2)
## Analysis of Variance Table
## 
## Response: consumption
##                 Df  Sum Sq Mean Sq  F value    Pr(>F)    
## lweight          1 1421.57 1421.57 1607.458 < 2.2e-16 ***
## fdrive           2   52.58   26.29   29.726 9.079e-13 ***
## lweight:fdrive   2   26.70   13.35   15.097 4.758e-07 ***
## Residuals      403  356.40    0.88                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1




Type II ANOVA tables

library("car")
Anova(mInter1, type = "II")
## Anova Table (Type II tests)
## 
## Response: consumption
##                Sum Sq  Df  F value    Pr(>F)    
## fdrive          52.58   2   29.726 9.079e-13 ***
## lweight        954.26   1 1079.040 < 2.2e-16 ***
## fdrive:lweight  26.70   2   15.097 4.758e-07 ***
## Residuals      356.40 403                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mInter2, type = "II")     ### the same results
## Anova Table (Type II tests)
## 
## Response: consumption
##                Sum Sq  Df  F value    Pr(>F)    
## lweight        954.26   1 1079.040 < 2.2e-16 ***
## fdrive          52.58   2   29.726 9.079e-13 ***
## lweight:fdrive  26.70   2   15.097 4.758e-07 ***
## Residuals      356.40 403                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1




Type III ANOVA tables

Anova(mInter1, type = "III")
## Anova Table (Type III tests)
## 
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)    
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mInter2, type = "III")     ### the same results
## Anova Table (Type III tests)
## 
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)    
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight:fdrive  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Use three different parameterizations of the categorical covariate fdrive

mInter <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow)
mInterSAS <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow, contrasts = list(fdrive = contr.SAS))
mIntersum <- lm(consumption ~ fdrive + lweight + fdrive:lweight, data = CarsNow, contrasts = list(fdrive = contr.sum))

Interpretation of the model parameters?

summary(mInter)
## 
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight, 
##     data = CarsNow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4038 -0.6438 -0.1021  0.5672  4.3237 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -52.8047     2.5266 -20.900  < 2e-16 ***
## fdriverear          19.8445     5.1297   3.869 0.000128 ***
## fdrive4x4          -12.5366     4.6506  -2.696 0.007319 ** 
## lweight              8.5716     0.3461  24.763  < 2e-16 ***
## fdriverear:lweight  -2.5890     0.6956  -3.722 0.000226 ***
## fdrive4x4:lweight    1.7837     0.6240   2.858 0.004480 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057 
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
summary(mInterSAS)
## 
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight, 
##     data = CarsNow, contrasts = list(fdrive = contr.SAS))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4038 -0.6438 -0.1021  0.5672  4.3237 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -65.3414     3.9045 -16.735  < 2e-16 ***
## fdrive1          12.5366     4.6506   2.696  0.00732 ** 
## fdrive2          32.3811     5.9309   5.460 8.35e-08 ***
## lweight          10.3553     0.5192  19.943  < 2e-16 ***
## fdrive1:lweight  -1.7837     0.6240  -2.858  0.00448 ** 
## fdrive2:lweight  -4.3727     0.7961  -5.493 7.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057 
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16
summary(mIntersum)
## 
## Call:
## lm(formula = consumption ~ fdrive + lweight + fdrive:lweight, 
##     data = CarsNow, contrasts = list(fdrive = contr.sum))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4038 -0.6438 -0.1021  0.5672  4.3237 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -50.3688     2.1489 -23.440  < 2e-16 ***
## fdrive1          -2.4360     2.5972  -0.938    0.349    
## fdrive2          17.4085     3.3558   5.188 3.38e-07 ***
## lweight           8.3031     0.2894  28.696  < 2e-16 ***
## fdrive1:lweight   0.2684     0.3517   0.763    0.446    
## fdrive2:lweight  -2.3206     0.4529  -5.124 4.64e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9404 on 403 degrees of freedom
## Multiple R-squared:  0.8081, Adjusted R-squared:  0.8057 
## F-statistic: 339.4 on 5 and 403 DF,  p-value: < 2.2e-16

Type III ANOVA tables

Anova(mInter, type = "III")
## Anova Table (Type III tests)
## 
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)    
## (Intercept)    386.28   1 436.793 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        542.30   1 613.216 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mInterSAS, type = "III")
## Anova Table (Type III tests)
## 
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)    
## (Intercept)    247.68   1 280.063 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        351.72   1 397.714 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mIntersum, type = "III")
## Anova Table (Type III tests)
## 
## Response: consumption
##                Sum Sq  Df F value    Pr(>F)    
## (Intercept)    485.88   1 549.416 < 2.2e-16 ***
## fdrive          26.49   2  14.979 5.310e-07 ***
## lweight        728.22   1 823.440 < 2.2e-16 ***
## fdrive:lweight  26.70   2  15.097 4.758e-07 ***
## Residuals      356.40 403                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1