NMSA407 Linear Regression: Tutorial

Multiple comparison procedures (Tukey, Hothorn–Bretz–Westfall)

Data Howells

Introduction

Full (unbalanced) dataset

data(HowellsAll, package = "mffSM")
head(HowellsAll)

##   gender popul oca gol fgender fpopul fgen.pop fpop.gen
## 1      1     1 123 176       M   BERG   M:BERG   BERG:M
## 2      1     1 115 173       M   BERG   M:BERG   BERG:M
## 3      1     1 117 176       M   BERG   M:BERG   BERG:M
## 4      1     1 113 185       M   BERG   M:BERG   BERG:M
## 5      1     1 121 170       M   BERG   M:BERG   BERG:M
## 6      1     1 118 180       M   BERG   M:BERG   BERG:M

dim(HowellsAll)

## [1] 289   8

summary(HowellsAll)

##      gender           popul            oca             gol        fgender    fpopul   
##  Min.   :0.0000   Min.   :0.000   Min.   :102.0   Min.   :155.0   F:156   AUSTR : 71  
##  1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:113.0   1st Qu.:172.0   M:133   BERG  :109  
##  Median :0.0000   Median :1.000   Median :116.0   Median :178.0           BURIAT:109  
##  Mean   :0.4602   Mean   :1.131   Mean   :115.8   Mean   :178.1                       
##  3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:119.0   3rd Qu.:184.0                       
##  Max.   :1.0000   Max.   :2.000   Max.   :131.0   Max.   :198.0                       
##      fgen.pop      fpop.gen 
##  F:AUSTR :49   AUSTR:F :49  
##  F:BERG  :53   AUSTR:M :22  
##  F:BURIAT:54   BERG:F  :53  
##  M:AUSTR :22   BERG:M  :56  
##  M:BERG  :56   BURIAT:F:54  
##  M:BURIAT:55   BURIAT:M:55

Balanced subset (was artificially created)

data(Howells, package = "mffSM")
head(Howells)

##   gender popul oca gol fgender fpopul fgen.pop fpop.gen
## 1      0     0 110 191       F  AUSTR  F:AUSTR  AUSTR:F
## 2      0     0 117 183       F  AUSTR  F:AUSTR  AUSTR:F
## 3      0     0 119 181       F  AUSTR  F:AUSTR  AUSTR:F
## 4      0     0 108 172       F  AUSTR  F:AUSTR  AUSTR:F
## 5      0     0 113 190       F  AUSTR  F:AUSTR  AUSTR:F
## 6      0     0 110 180       F  AUSTR  F:AUSTR  AUSTR:F

dim(Howells)

## [1] 240   8

summary(Howells)

##      gender        popul        oca             gol        fgender    fpopul       fgen.pop 
##  Min.   :0.0   Min.   :0   Min.   :102.0   Min.   :155.0   F:120   AUSTR :80   F:AUSTR :40  
##  1st Qu.:0.0   1st Qu.:0   1st Qu.:112.0   1st Qu.:173.0   M:120   BERG  :80   F:BERG  :40  
##  Median :0.5   Median :1   Median :116.0   Median :179.0           BURIAT:80   F:BURIAT:40  
##  Mean   :0.5   Mean   :1   Mean   :115.7   Mean   :179.3                       M:AUSTR :40  
##  3rd Qu.:1.0   3rd Qu.:2   3rd Qu.:119.0   3rd Qu.:186.0                       M:BERG  :40  
##  Max.   :1.0   Max.   :2   Max.   :131.0   Max.   :201.0                       M:BURIAT:40  
##      fpop.gen 
##  AUSTR:F :40  
##  AUSTR:M :40  
##  BERG:F  :40  
##  BERG:M  :40  
##  BURIAT:F:40  
##  BURIAT:M:40

Some colors (used in plots below)

genCOL <- rainbow_hcl(2, start = 10, end = 280)
genCOL2 <- c("red3", "darkblue")
names(genCOL) <- names(genCOL2) <- levels(Howells[, "fgender"])

#popCOL <- c(rainbow_hcl(3, start = 5, end = 310), rainbow_hcl(3, start = 25, end = 330))
popCOL <- rainbow_hcl(3, start = 50, end = 250)
popCOL2 <- c("red", "darkgreen", "blue")
names(popCOL) <- names(popCOL2) <- levels(Howells[, "fpopul"])

Dependence of glabell-occipital length (`gol`) on population (`fpopul`) and gender (`fgender`)

Balanced dataset Howells will be analyzed.

Spaghetti plot

with(Howells, interaction.plot(fgender, fpopul,  gol, col = popCOL2, type = "b",
                               lwd = 2, lty = 1, xlab = "Gender", trace.label = "Population", ylab = "Mean of GOL"))

plot of chunk fig-Simult-01-01

F-test for the effect of population

m1 <- lm(gol ~ fpopul + fgender, data = Howells)
m0 <- lm(gol ~ fgender, data = Howells)

summary(m1)

## 
## Call:
## lm(formula = gol ~ fpopul + fgender, data = Howells)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7333  -4.5167  -0.1333   4.7042  15.9833 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  181.2333     0.8156  222.20   <2e-16 ***
## fpopulBERG   -10.5000     0.9990  -10.51   <2e-16 ***
## fpopulBURIAT  -9.2000     0.9990   -9.21   <2e-16 ***
## fgenderM       9.2833     0.8156   11.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.318 on 236 degrees of freedom
## Multiple R-squared:  0.525,  Adjusted R-squared:  0.519 
## F-statistic: 86.96 on 3 and 236 DF,  p-value: < 2.2e-16

anova(m0, m1)

## Analysis of Variance Table
## 
## Model 1: gol ~ fgender
## Model 2: gol ~ fpopul + fgender
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    238 14662.4                                  
## 2    236  9420.2  2    5242.1 65.664 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Levels of the two factors

(lpop <- levels(Howells[, "fpopul"]))

## [1] "AUSTR"  "BERG"   "BURIAT"

(lgen <- levels(Howells[, "fgender"]))

## [1] "F" "M"

Effect of population (`fpopul`)

Pseudocontrast matrix for fpopul

(CX <- contr.treatment(3))

##   2 3
## 1 0 0
## 2 1 0
## 3 0 1

rownames(CX) <- lpop
print(CX)

##        2 3
## AUSTR  0 0
## BERG   1 0
## BURIAT 0 1

\(\mathbb{L}\) matrix to estimate all partial effects of `fpopul` given `fgender`

L.popul <- cbind(0, rbind(CX[2,] - CX[1,],
                          CX[3,] - CX[1,],
                          CX[3,] - CX[2,]), 0)
colnames(L.popul) <- names(coef(m1))
rownames(L.popul) <- paste(lpop[c(2, 3, 3)], "-", lpop[c(1, 1, 2)], sep = "")
print(L.popul)

##              (Intercept) fpopulBERG fpopulBURIAT fgenderM
## BERG-AUSTR             0          1            0        0
## BURIAT-AUSTR           0          0            1        0
## BURIAT-BERG            0         -1            1        0

Unadjusted inference

library("mffSM")
LS.popul <- LSest(m1, L = L.popul)
print(LS.popul, digits = 3)

##              Estimate Std. Error t value P value   Lower Upper
## BERG-AUSTR      -10.5      0.999  -10.51  <2e-16 -12.468 -8.53
## BURIAT-AUSTR     -9.2      0.999   -9.21  <2e-16 -11.168 -7.23
## BURIAT-BERG       1.3      0.999    1.30  0.1944  -0.668  3.27

Unadjusted P-values

pvalUnadj <- LS.popul[["P value"]]
fpvalUnadj <- format.pval(pvalUnadj, digits = 3, eps = 1e-4)
names(fpvalUnadj) <- attr(LS.popul, "row.names")
print(fpvalUnadj)

##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##     "<1e-04"     "<1e-04"      "0.194"

Bonferroni adjusted P-values

pvalBonf <- p.adjust(LS.popul[["P value"]], method = "bonferroni")
fpvalBonf <- format.pval(pvalBonf, digits = 3, eps = 1e-4)
names(fpvalBonf) <- attr(LS.popul, "row.names")
print(fpvalBonf)

##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##     "<1e-04"     "<1e-04"      "0.583"

Bonferroni simultaneous 95% confidence intervals

m <- 3
(conf.level.Bonf <- 1 - 0.05/m)

## [1] 0.9833333

LSBonf <- LSest(m1, L = L.popul, conf.level = conf.level.Bonf)
print(LSBonf, digits = 3)

##              Estimate Std. Error t value P value  Lower Upper
## BERG-AUSTR      -10.5      0.999  -10.51  <2e-16 -12.91 -8.09
## BURIAT-AUSTR     -9.2      0.999   -9.21  <2e-16 -11.61 -6.79
## BURIAT-BERG       1.3      0.999    1.30  0.1944  -1.11  3.71

REMARK: P-values in the output above are NOT adjusted! Only the confidence intervals are obtained by the Bonferroni method.

Put all results in a data frame

MCP <- data.frame(Estimate   = LS.popul[["Estimate"]],
                  PvalUnadj  = fpvalUnadj,
                  PvalBonf   = fpvalBonf,
                  LowerUnadj = LS.popul[["Lower"]],
                  UpperUnadj = LS.popul[["Upper"]],                 
                  LowerBonf  = LSBonf[["Lower"]],
                  UpperBonf  = LSBonf[["Upper"]], stringsAsFactors = FALSE)
print(MCP, digits = 3)

##              Estimate PvalUnadj PvalBonf LowerUnadj UpperUnadj LowerBonf UpperBonf
## BERG-AUSTR      -10.5    <1e-04   <1e-04    -12.468      -8.53    -12.91     -8.09
## BURIAT-AUSTR     -9.2    <1e-04   <1e-04    -11.168      -7.23    -11.61     -6.79
## BURIAT-BERG       1.3     0.194    0.583     -0.668       3.27     -1.11      3.71

Tukey's MCP

a1 <- aov(gol ~ fpopul + fgender, data = Howells)
TukeyHSD(a1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gol ~ fpopul + fgender, data = Howells)
## 
## $fpopul
##               diff        lwr       upr     p adj
## BERG-AUSTR   -10.5 -12.856127 -8.143873 0.0000000
## BURIAT-AUSTR  -9.2 -11.556127 -6.843873 0.0000000
## BURIAT-BERG    1.3  -1.056127  3.656127 0.3958263
## 
## $fgender
##         diff      lwr     upr p adj
## M-F 9.283333 7.676465 10.8902     0

                ### It can be used in the same way
                ### for one-way ANOVA as well
                ### For example:
   aOneWay <- aov(gol ~ fpopul, data = Howells)
   TukeyHSD(aOneWay)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gol ~ fpopul, data = Howells)
## 
## $fpopul
##               diff        lwr       upr     p adj
## BERG-AUSTR   -10.5 -13.426047 -7.573953 0.0000000
## BURIAT-AUSTR  -9.2 -12.126047 -6.273953 0.0000000
## BURIAT-BERG    1.3  -1.626047  4.226047 0.5474721

Unadjusted inference for comparison

print(LS.popul)

##              Estimate Std. Error    t value P value       Lower     Upper
## BERG-AUSTR      -10.5  0.9989525 -10.511010  <2e-16 -12.4680032 -8.531997
## BURIAT-AUSTR     -9.2  0.9989525  -9.209647  <2e-16 -11.1680032 -7.231997
## BURIAT-BERG       1.3  0.9989525   1.301363  0.1944  -0.6680032  3.268003

Graphical representation of the results

plot(TukeyHSD(a1, which = "fpopul"))

plot of chunk fig-Simult-01-02

Put all results known by now in a data frame

ta1 <- TukeyHSD(a1)[["fpopul"]]
fpvalTukey <- format.pval(ta1[, "p adj"], digits = 3, eps = 1e-4)
names(fpvalTukey) <- rownames(ta1)
print(fpvalTukey)

##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##     "<1e-04"     "<1e-04"      "0.396"

MCP <- data.frame(Estimate   = LS.popul[["Estimate"]],
                  PvalUnadj  = fpvalUnadj,
                  PvalBonf   = fpvalBonf,
                  PvalTukey  = fpvalTukey,
                  LowerUnadj = LS.popul[["Lower"]],
                  UpperUnadj = LS.popul[["Upper"]],                 
                  LowerBonf  = LSBonf[["Lower"]],
                  UpperBonf  = LSBonf[["Upper"]],
                  LowerTukey = ta1[, "lwr"],
                  UpperTukey = ta1[, "upr"],
                  stringsAsFactors = FALSE)
print(MCP[, 1:4], digits = 3)

##              Estimate PvalUnadj PvalBonf PvalTukey
## BERG-AUSTR      -10.5    <1e-04   <1e-04    <1e-04
## BURIAT-AUSTR     -9.2    <1e-04   <1e-04    <1e-04
## BURIAT-BERG       1.3     0.194    0.583     0.396

print(MCP[, 5:10], digits = 3)

##              LowerUnadj UpperUnadj LowerBonf UpperBonf LowerTukey UpperTukey
## BERG-AUSTR      -12.468      -8.53    -12.91     -8.09     -12.86      -8.14
## BURIAT-AUSTR    -11.168      -7.23    -11.61     -6.79     -11.56      -6.84
## BURIAT-BERG      -0.668       3.27     -1.11      3.71      -1.06       3.66

Estimated means of the group means and the covariance matrix of the estimator

\(\mathbb{L}\) matrix to calculate means of the group means for population

L.mean.popul <- cbind(1, CX, 1/2)
colnames(L.mean.popul) <- names(coef(m1))
rownames(L.mean.popul) <- lpop
print(L.mean.popul)

##        (Intercept) fpopulBERG fpopulBURIAT fgenderM
## AUSTR            1          0            0      0.5
## BERG             1          1            0      0.5
## BURIAT           1          0            1      0.5

QUESTION: Would you know how to specify the \(\mathbb{L}\) matrix in case the sum contrasts were used for estimation?

Estimated means of the group means for population

LS.mean.popul <- LSest(m1, L = L.mean.popul)
print(LS.mean.popul)

##        Estimate Std. Error  t value    P value    Lower    Upper
## AUSTR   185.875  0.7063661 263.1426 < 2.22e-16 184.4834 187.2666
## BERG    175.375  0.7063661 248.2778 < 2.22e-16 173.9834 176.7666
## BURIAT  176.675  0.7063661 250.1182 < 2.22e-16 175.2834 178.0666

mean.popul <- LS.mean.popul[["Estimate"]]
names(mean.popul) <- attr(LS.mean.popul, "row.names")
print(mean.popul)

##   AUSTR    BERG  BURIAT 
## 185.875 175.375 176.675

Estimated differences between the means of the group means for population (partial effects of population given gender)

outer(mean.popul, mean.popul, "-")

##        AUSTR BERG BURIAT
## AUSTR    0.0 10.5    9.2
## BERG   -10.5  0.0   -1.3
## BURIAT  -9.2  1.3    0.0

print(LS.popul)

##              Estimate Std. Error    t value P value       Lower     Upper
## BERG-AUSTR      -10.5  0.9989525 -10.511010  <2e-16 -12.4680032 -8.531997
## BURIAT-AUSTR     -9.2  0.9989525  -9.209647  <2e-16 -11.1680032 -7.231997
## BURIAT-BERG       1.3  0.9989525   1.301363  0.1944  -0.6680032  3.268003

Covariance matrix for LSE's of the means of the group means

Theory says that such matrix should be diagonal with a constant diagonal \[\frac{\mbox{MS}_e}{J\cdot H}\] here: \(J = 40\), \(H = 2\), \(\mbox{MS}_e = 39.92\)

(MSe <- summary(m1)[["sigma"]]^2)

## [1] 39.91624

MSe / (40 * 2)

## [1] 0.498953

L.mean.popul %*% vcov(m1) %*% t(L.mean.popul)

##                AUSTR         BERG        BURIAT
## AUSTR   4.989530e-01 8.326673e-17 -2.775558e-17
## BERG    8.326673e-17 4.989530e-01  6.352828e-17
## BURIAT -2.775558e-17 6.187763e-17  4.989530e-01

round(L.mean.popul %*% vcov(m1) %*% t(L.mean.popul), 15)

##           AUSTR     BERG   BURIAT
## AUSTR  0.498953 0.000000 0.000000
## BERG   0.000000 0.498953 0.000000
## BURIAT 0.000000 0.000000 0.498953

Unbalanced (full) data

Function `TukeyHSD` for two-way classification in case of unbalanced design

Function TukeyHSD is ALWAYS (irrespective of whether the design is balanced or unbalanced) based on the \(\mathbf{T}\) statistic (for Theorem 12.3 or 12.4) which is composed of the group means.

\(\mathsf{E}\mathbf{T}\) is then in fact a WEIGHTED mean of the group means and not the mean of the means. Consequently, differences in \(\mathsf{E}\mathbf{T}\) do not provide partial effects.

m1unbalanced <- lm(gol ~ fpopul + fgender, data = HowellsAll)
summary(m1unbalanced)

## 
## Call:
## lm(formula = gol ~ fpopul + fgender, data = HowellsAll)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5400  -4.3103  -0.3103   4.4600  17.6897 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  181.0712     0.7814 231.724   <2e-16 ***
## fpopulBERG   -10.5311     0.9706 -10.850   <2e-16 ***
## fpopulBURIAT  -9.2213     0.9695  -9.511   <2e-16 ***
## fgenderM       9.7703     0.7529  12.977   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.284 on 285 degrees of freedom
## Multiple R-squared:  0.4729, Adjusted R-squared:  0.4674 
## F-statistic: 85.24 on 3 and 285 DF,  p-value: < 2.2e-16

a1unbalanced <- aov(gol ~ fpopul + fgender, data = HowellsAll)
TukeyHSD(a1unbalanced, which = "fpopul")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gol ~ fpopul + fgender, data = HowellsAll)
## 
## $fpopul
##                   diff         lwr       upr     p adj
## BERG-AUSTR   -8.538959 -10.7968783 -6.281039 0.0000000
## BURIAT-AUSTR -7.318775  -9.5766948 -5.060855 0.0000000
## BURIAT-BERG   1.220183  -0.7852877  3.225655 0.3250874

print(TukeyHSD(a1unbalanced, which = "fpopul"), digits = 3)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gol ~ fpopul + fgender, data = HowellsAll)
## 
## $fpopul
##               diff     lwr   upr p adj
## BERG-AUSTR   -8.54 -10.797 -6.28 0.000
## BURIAT-AUSTR -7.32  -9.577 -5.06 0.000
## BURIAT-BERG   1.22  -0.785  3.23 0.325

MCP for the WEIGHTED means of the group means was provided.

Inference for the means of the means (\(=\) partial effects of `fpopul` given `fgender` derived from the underlying linear model

Estimated partial effects based on model `m1unbalanced` are:

LS.popul.unbalanced <- LSest(m1unbalanced, L = L.popul)
print(LS.popul.unbalanced, digits = 3)

##              Estimate Std. Error t value P value   Lower Upper
## BERG-AUSTR     -10.53      0.971  -10.85 < 2e-16 -12.442 -8.62
## BURIAT-AUSTR    -9.22      0.970   -9.51 < 2e-16 -11.130 -7.31
## BURIAT-BERG      1.31      0.851    1.54 0.12498  -0.366  2.99

Estimated means of the group means based on model `m1unbalanced`:

LS.mean.popul.unbalanced <- LSest(m1unbalanced, L = L.mean.popul)
print(LS.mean.popul.unbalanced, digits = 5)

##        Estimate Std. Error t value    P value  Lower  Upper
## AUSTR    185.96    0.75939  244.88 < 2.22e-16 184.46 187.45
## BERG     175.43    0.60199  291.41 < 2.22e-16 174.24 176.61
## BURIAT   176.73    0.60191  293.63 < 2.22e-16 175.55 177.92

Covariance matrix for LSE's of the means of the group means

L.mean.popul %*% vcov(m1unbalanced) %*% t(L.mean.popul)

##                AUSTR          BERG        BURIAT
## AUSTR   0.5766692121 -1.483287e-03 -4.944292e-04
## BERG   -0.0014832875  3.623863e-01  3.578437e-05
## BURIAT -0.0004944292  3.578437e-05  3.622909e-01

round(L.mean.popul %*% vcov(m1unbalanced) %*% t(L.mean.popul), 5)

##           AUSTR     BERG   BURIAT
## AUSTR   0.57667 -0.00148 -0.00049
## BERG   -0.00148  0.36239  0.00004
## BURIAT -0.00049  0.00004  0.36229

Correlation matrix for LSE's of the means of the group means

cov2cor(L.mean.popul %*% vcov(m1unbalanced) %*% t(L.mean.popul))

##               AUSTR          BERG        BURIAT
## AUSTR   1.000000000 -3.244711e-03 -1.081713e-03
## BERG   -0.003244711  1.000000e+00  9.875948e-05
## BURIAT -0.001081713  9.875948e-05  1.000000e+00

round(cov2cor(L.mean.popul %*% vcov(m1unbalanced) %*% t(L.mean.popul)), 4)

##          AUSTR    BERG  BURIAT
## AUSTR   1.0000 -0.0032 -0.0011
## BERG   -0.0032  1.0000  0.0001
## BURIAT -0.0011  0.0001  1.0000

Correlations are quite low. This is consequence of the fact that design was not too unbalanced:

with(HowellsAll, table(fpopul, fgender))

##         fgender
## fpopul    F  M
##   AUSTR  49 22
##   BERG   53 56
##   BURIAT 54 55

Function `TukeyHSD` for one-way classification in case of unbalanced design

No problem to use function TukeyHSD for multiple comparison in a context of ONE-WAY classification even in case on UNBALANCED design. There, expectations of the group sample means are equal to the expectations of the response in the groups, i.e., to the quantities we want to compare.

Frequency table

with(HowellsAll, table(fpopul))

## fpopul
##  AUSTR   BERG BURIAT 
##     71    109    109

Tukey's MCP in one-way classification by `fpopul`

aOneWay <- aov(gol ~ fpopul, data = HowellsAll)
TukeyHSD(aOneWay)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gol ~ fpopul, data = HowellsAll)
## 
## $fpopul
##                   diff        lwr       upr    p adj
## BERG-AUSTR   -8.538959 -11.381824 -5.696093 0.000000
## BURIAT-AUSTR -7.318775 -10.161641 -4.475909 0.000000
## BURIAT-BERG   1.220183  -1.304833  3.745200 0.491191

Unadjusted inference for the differences of the group means

mOneWay <- lm(gol ~ fpopul, data = HowellsAll)
L.oneWay.popul <- cbind(0, rbind(CX[2,] - CX[1,],
                                 CX[3,] - CX[1,],
                                 CX[3,] - CX[2,]))
colnames(L.oneWay.popul) <- names(coef(mOneWay))
rownames(L.oneWay.popul) <- paste(lpop[c(2, 3, 3)], "-", lpop[c(1, 1, 2)], sep = "")
print(L.oneWay.popul)

##              (Intercept) fpopulBERG fpopulBURIAT
## BERG-AUSTR             0          1            0
## BURIAT-AUSTR           0          0            1
## BURIAT-BERG            0         -1            1

LSest(mOneWay, L = L.oneWay.popul)

##               Estimate Std. Error   t value    P value       Lower     Upper
## BERG-AUSTR   -8.538959   1.206659 -7.076531 1.1431e-11 -10.9140169 -6.163900
## BURIAT-AUSTR -7.318775   1.206659 -6.065323 4.1599e-09  -9.6938334 -4.943717
## BURIAT-BERG   1.220183   1.071747  1.138499    0.25586  -0.8893295  3.329697

Point estimates are (indeed) the same as those produced by TukeyHSD.

Hothorn-Bretz-Westfall MCP for the effect of population in the additive model

It is again worked with balanced data.

Underlying linear model

m1 <- lm(gol ~ fpopul + fgender, data = Howells)

Basis for MCP

library("multcomp")
mcp <- glht(m1, linfct = mcp(fpopul = "Tukey"), rhs = 0)
print(mcp)        ## Only point estimates.

## 
##   General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Linear Hypotheses:
##                     Estimate
## BERG - AUSTR == 0      -10.5
## BURIAT - AUSTR == 0     -9.2
## BURIAT - BERG == 0       1.3

NOTE: Tukey in the above call is only a keyword to say that all pairwise differences for factor fpopul are of interest. All calculation is (and will be) based on the Hothorn-Bretz-Westfall procedure.

It is possible to supply directly the \(\mathbb{L}\) matrix (which becomes necessary if simultaneous inference for general linear combinations of the regression coefficients becomes of interest):

mcp <- glht(m1, linfct = L.popul, rhs = 0)
print(mcp)

## 
##   General Linear Hypotheses
## 
## Linear Hypotheses:
##                   Estimate
## BERG-AUSTR == 0      -10.5
## BURIAT-AUSTR == 0     -9.2
## BURIAT-BERG == 0       1.3

MCP adjusted P-values

set.seed(20010911)
smcp <- summary(mcp)
print(smcp)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = gol ~ fpopul + fgender, data = Howells)
## 
## Linear Hypotheses:
##                   Estimate Std. Error t value Pr(>|t|)    
## BERG-AUSTR == 0    -10.500      0.999 -10.511   <1e-04 ***
## BURIAT-AUSTR == 0   -9.200      0.999  -9.210   <1e-04 ***
## BURIAT-BERG == 0     1.300      0.999   1.301    0.396    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

print(smcp[["test"]])

## $pfunction
## function (type = c("univariate", "adjusted", p.adjust.methods), 
##     ...) 
## {
##     type <- match.arg(type)
##     pfct <- function(q) {
##         switch(object$alternative, two.sided = {
##             low <- rep(-abs(q), dim)
##             upp <- rep(abs(q), dim)
##         }, less = {
##             low <- rep(q, dim)
##             upp <- rep(Inf, dim)
##         }, greater = {
##             low <- rep(-Inf, dim)
##             upp <- rep(q, dim)
##         })
##         pmvt(lower = low, upper = upp, df = df, corr = cr, ...)
##     }
##     switch(object$alternative, two.sided = {
##         if (df > 0) pvals <- 2 * (1 - pt(abs(tstat), df)) else pvals <- 2 * 
##             (1 - pnorm(abs(tstat)))
##     }, less = {
##         if (df > 0) pvals <- pt(tstat, df) else pvals <- pnorm(tstat)
##     }, greater = {
##         if (df > 0) pvals <- 1 - pt(tstat, df) else pvals <- 1 - 
##             pnorm(tstat)
##     })
##     if (type == "univariate") 
##         return(pvals)
##     if (type == "adjusted") {
##         ret <- numeric(length(tstat))
##         error <- 0
##         for (i in 1:length(tstat)) {
##             tmp <- pfct(tstat[i])
##             if (attr(tmp, "msg") != "Normal Completion" && length(grep("^univariate", 
##                 attr(tmp, "msg"))) == 0) 
##                 warning(attr(tmp, "msg"))
##             if (error < attr(tmp, "error")) 
##                 error <- attr(tmp, "error")
##             ret[i] <- tmp
##         }
##         ret <- 1 - ret
##         attr(ret, "error") <- error
##         return(ret)
##     }
##     return(p.adjust(pvals, method = type))
## }
## <environment: 0xa6f2d8c>
## 
## $qfunction
## function (conf.level, adjusted = TRUE, ...) 
## {
##     tail <- switch(object$alternative, two.sided = "both.tails", 
##         less = "lower.tail", greater = "upper.tail")
##     if (adjusted) {
##         calpha <- qmvt(conf.level, df = df, corr = cr, tail = tail, 
##             ...)
##     }
##     else {
##         calpha <- qmvt(conf.level, df = df, corr = matrix(1), 
##             tail = tail, ...)
##     }
##     ret <- calpha$quantile
##     attr(ret, "error") <- calpha$estim.prec
##     return(ret)
## }
## <environment: 0xa6f2d8c>
## 
## $coefficients
##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##        -10.5         -9.2          1.3 
## 
## $sigma
##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##    0.9989525    0.9989525    0.9989525 
## 
## $tstat
##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##   -10.511010    -9.209647     1.301363 
## 
## $pvalues
## [1] 0.0000000 0.0000000 0.3958337
## attr(,"error")
## [1] 2.665086e-05
## 
## $type
## [1] "single-step"
## 
## attr(,"class")
## [1] "mtest"

To get P-values, Monte Carlo integration is conducted to calculate needed integral. Observed slightly different P-values:

set.seed(221913277); print(summary(mcp)[["test"]][["pvalues"]])

## [1] 0.0000000 0.0000000 0.3958226
## attr(,"error")
## [1] 2.947862e-05

set.seed(186751820); print(summary(mcp)[["test"]][["pvalues"]])

## [1] 0.0000000 0.0000000 0.3958177
## attr(,"error")
## [1] 1.368038e-05

set.seed(201411199); print(summary(mcp)[["test"]][["pvalues"]])

## [1] 0.000000e+00 1.110223e-16 3.958150e-01
## attr(,"error")
## [1] 2.707282e-05

Simultaneous confidence intervals

set.seed(20010911)
cimcp <- confint(mcp, level = 0.95)
print(cimcp)

## 
##   Simultaneous Confidence Intervals
## 
## Fit: lm(formula = gol ~ fpopul + fgender, data = Howells)
## 
## Quantile = 2.358
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##                   Estimate lwr      upr     
## BERG-AUSTR == 0   -10.5000 -12.8556  -8.1444
## BURIAT-AUSTR == 0  -9.2000 -11.5556  -6.8444
## BURIAT-BERG == 0    1.3000  -1.0556   3.6556

Needed quantile of the max-abs-t-distribution is again calculated by the mean of the Monte Carlo method. Observed slighly different results. Needed quantile is shown as calpha attribute.

set.seed(221913277); print(confint(mcp, level = 0.95)[["confint"]], digits = 10)

##              Estimate           lwr          upr
## BERG-AUSTR      -10.5 -12.856444851 -8.143555149
## BURIAT-AUSTR     -9.2 -11.556444851 -6.843555149
## BURIAT-BERG       1.3  -1.056444851  3.656444851
## attr(,"conf.level")
## [1] 0.95
## attr(,"calpha")
## [1] 2.358915843

set.seed(186751820); print(confint(mcp, level = 0.95)[["confint"]], digits = 10)

##              Estimate          lwr         upr
## BERG-AUSTR      -10.5 -12.85586935 -8.14413065
## BURIAT-AUSTR     -9.2 -11.55586935 -6.84413065
## BURIAT-BERG       1.3  -1.05586935  3.65586935
## attr(,"conf.level")
## [1] 0.95
## attr(,"calpha")
## [1] 2.358339739

set.seed(201411199); print(confint(mcp, level = 0.95)[["confint"]], digits = 10)

##              Estimate           lwr          upr
## BERG-AUSTR      -10.5 -12.856465843 -8.143534157
## BURIAT-AUSTR     -9.2 -11.556465843 -6.843534157
## BURIAT-BERG       1.3  -1.056465843  3.656465843
## attr(,"conf.level")
## [1] 0.95
## attr(,"calpha")
## [1] 2.358936858

Put all results known by now in a data frame

fpvalHBW <- format.pval(smcp[["test"]][["pvalues"]], digits = 3, eps = 1e-4)
names(fpvalHBW) <- rownames(ta1)
print(fpvalHBW)

##   BERG-AUSTR BURIAT-AUSTR  BURIAT-BERG 
##     "<1e-04"     "<1e-04"      "0.396"

MCP <- data.frame(Estimate   = LS.popul[["Estimate"]],
                  PvalUnadj  = fpvalUnadj,
                  PvalBonf   = fpvalBonf,
                  PvalTukey  = fpvalTukey,
                  PvalHBW  = fpvalHBW,                  
                  LowerUnadj = LS.popul[["Lower"]],
                  UpperUnadj = LS.popul[["Upper"]],                 
                  LowerBonf  = LSBonf[["Lower"]],
                  UpperBonf  = LSBonf[["Upper"]],
                  LowerTukey = ta1[, "lwr"],
                  UpperTukey = ta1[, "upr"],
                  LowerHBW   = cimcp[["confint"]][, "lwr"],
                  UpperHBW   = cimcp[["confint"]][, "upr"],                  
                  stringsAsFactors = FALSE)
print(MCP[, 1:5], digits = 3)

##              Estimate PvalUnadj PvalBonf PvalTukey PvalHBW
## BERG-AUSTR      -10.5    <1e-04   <1e-04    <1e-04  <1e-04
## BURIAT-AUSTR     -9.2    <1e-04   <1e-04    <1e-04  <1e-04
## BURIAT-BERG       1.3     0.194    0.583     0.396   0.396

print(MCP[, 6:13], digits = 3)

##              LowerUnadj UpperUnadj LowerBonf UpperBonf LowerTukey UpperTukey LowerHBW UpperHBW
## BERG-AUSTR      -12.468      -8.53    -12.91     -8.09     -12.86      -8.14   -12.86    -8.14
## BURIAT-AUSTR    -11.168      -7.23    -11.61     -6.79     -11.56      -6.84   -11.56    -6.84
## BURIAT-BERG      -0.668       3.27     -1.11      3.71      -1.06       3.66    -1.06     3.66

NMSA407 Linear Regression: Tutorial

Introduction

Full (unbalanced) dataset

Balanced subset (was artificially created)

Some colors (used in plots below)

Dependence of glabell-occipital length (gol) on population (fpopul) and gender (fgender)

Spaghetti plot

F-test for the effect of population

Levels of the two factors

Effect of population (fpopul)

Pseudocontrast matrix for fpopul

\(\mathbb{L}\) matrix to estimate all partial effects of fpopul given fgender

Unadjusted inference

Unadjusted P-values

Bonferroni adjusted P-values

Bonferroni simultaneous 95% confidence intervals

Put all results in a data frame

Tukey's MCP

Unadjusted inference for comparison

Graphical representation of the results

Put all results known by now in a data frame

Estimated means of the group means and the covariance matrix of the estimator

\(\mathbb{L}\) matrix to calculate means of the group means for population

Estimated means of the group means for population

Estimated differences between the means of the group means for population (partial effects of population given gender)

Covariance matrix for LSE's of the means of the group means

Unbalanced (full) data

Function TukeyHSD for two-way classification in case of unbalanced design

Inference for the means of the means (\(=\) partial effects of fpopul given fgender derived from the underlying linear model

Estimated partial effects based on model m1unbalanced are:

Estimated means of the group means based on model m1unbalanced:

Covariance matrix for LSE's of the means of the group means

Correlation matrix for LSE's of the means of the group means

Function TukeyHSD for one-way classification in case of unbalanced design

Frequency table

Tukey's MCP in one-way classification by fpopul

Unadjusted inference for the differences of the group means

Hothorn-Bretz-Westfall MCP for the effect of population in the additive model

Underlying linear model

Basis for MCP

MCP adjusted P-values

Simultaneous confidence intervals

Put all results known by now in a data frame

Dependence of glabell-occipital length (`gol`) on population (`fpopul`) and gender (`fgender`)

Effect of population (`fpopul`)

\(\mathbb{L}\) matrix to estimate all partial effects of `fpopul` given `fgender`

Function `TukeyHSD` for two-way classification in case of unbalanced design

Inference for the means of the means (\(=\) partial effects of `fpopul` given `fgender` derived from the underlying linear model

Estimated partial effects based on model `m1unbalanced` are:

Estimated means of the group means based on model `m1unbalanced`:

Function `TukeyHSD` for one-way classification in case of unbalanced design

Tukey's MCP in one-way classification by `fpopul`