Download this R markdown as: R, Rmd.
Outline of this lab session:
library("mffSM") # Dris data and LSest() function
library("car") # confidenceEllipse() function
library("colorspace") # rainbow_hcl() colors
We will work again with the dataset Dris and
peat from previous exercises.
peat <- read.csv("http://www.karlin.mff.cuni.cz/~vavraj/nmfm334/data/peat.csv",
header = TRUE, stringsAsFactors = TRUE)
data(Dris, package = "mffSM")
As usual, we start with the general notation for linear regression model \[ \mathbf{Y} = \begin{pmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{pmatrix} = \begin{pmatrix} \mathbf{X_1}^\top \\ \mathbf{X_2}^\top \\ \vdots \\\mathbf{X_n}^\top \end{pmatrix} \begin{pmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{pmatrix} + \begin{pmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{pmatrix} = \mathbb{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}. \] For simplicity, we will assume the following normality assumption for the error terms: \(\boldsymbol{\varepsilon} \sim \mathsf{N}_n(\mathbf{0}, \sigma^2 \mathbb{I}_n)\), which hides several other assumptions:
Given the regression matrix, the term \(\mathbb{X} \boldsymbol{\beta}\) only shifts the mean. Hence, \(\mathbf{Y} \,|\, \mathbb{X} \sim \mathsf{N}(\mathbb{X} \boldsymbol{\beta}, \sigma^2 \mathbb{I}_n)\). The least-squares estimator \(\widehat{\boldsymbol{\beta}} = \left(\mathbb{X}^\top \mathbb{X}\right)^{-1} \mathbb{X}^\top \mathbf{Y}\) is then just a linear transformation of \(\mathbf{Y}\) which preserves normality: \[ \widehat{\boldsymbol{\beta}} \,|\, \mathbb{X} \sim \mathsf{N}_p \left(\boldsymbol{\beta}, \sigma^2 \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\right). \] For any linear combination of coefficients \(\mathbf{c} \in \mathbb{R}^p\) we obtain \[ \mathbf{c}^\top \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \,\Big|\, \mathbb{X} \sim \mathsf{N} \left(\mathbf{0}, \sigma^2 \mathbf{c}^\top\left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{c}\right). \] As the residual sum of squares is independent of \(\widehat{\boldsymbol{\beta}}\) and \(\dfrac{\widehat{\sigma}_n^2 (n-p)}{\sigma^2} = \frac{\mathsf{RSS}}{\sigma^2} \sim \chi^2_{n-p}\), which finally gives us \[ \dfrac{\mathbf{c}^\top \left( \widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)}{\sqrt{\widehat{\sigma}_n^2\mathbf{c}^\top\left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{c}}} = \dfrac{ \dfrac{\mathbf{c}^\top \widehat{\boldsymbol{\beta}} - \mathbf{c}^\top\boldsymbol{\beta}}{\sqrt{\sigma^2\mathbf{c}^\top\left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{c}}} }{ \sqrt{\frac{\mathsf{RSS}}{\sigma^2(n-p)}} } \,{\Huge|}\, \mathbb{X} \sim \mathsf{t}_{n-p}. \]
Let us denote \(\theta = \mathbf{c}^\top \boldsymbol{\beta}\) and \(\widehat{\theta} = \mathbf{c}^\top \widehat{\boldsymbol{\beta}}\), then statistical test for \(\mathrm{H}_0: \theta = \theta_0\) is desired. Under the null hypothesis, the test statistic \[ T = \dfrac{\widehat{\theta} - \theta_0}{\sqrt{\widehat{\sigma}_n^2\mathbf{c}^\top\left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{c}}} \,{\Huge|}\, \mathbb{X} \sim \mathsf{t}_{n-p}, \] which can be used for construction of two-sided and one-sided tests on significance level \(\alpha\) depending on the choice of the alternative:
greater), then \(\mathrm{H}_0\) is rejected if \(T \geq \mathsf{t}_{n-p}(1-\alpha)\) with
p-value \(p = 1-F_{\mathsf{t}_{n-p}}
(t)\),less), then \(\mathrm{H}_0\) is rejected if \(T \leq \mathsf{t}_{n-p}(\alpha)\) with
p-value \(p = F_{\mathsf{t}_{n-p}}
(t)\).Within this section we will work with peat dataset where
we realised that fitting different lines for the dependence of
N on depth for each group is
desired:
fit3 <- lm(N ~ depth * group, data = peat)
XLAB <- "Depth [cm]"
YLAB <- "Nitrogen concentration [weight %]"
XLIM <- range(peat[, "depth"])
YLIM <- range(peat[, "N"])
PCH <- c(21, 22, 24, 25)
DARK <- rainbow_hcl(4, c = 80, l = 35)
COL <- rainbow_hcl(4, c = 80, l = 50)
BGC <- rainbow_hcl(4, c = 80, l = 70)
TRC <- rainbow_hcl(4, c = 80, l = 70, alpha = 0.3)
names(PCH) <- names(COL) <- names(BGC) <- names(DARK) <- names(TRC) <- levels(peat[, "group"])
par(mfrow = c(1,1), mar = c(4,4,0.5,0.5))
plot(N ~ jitter(depth), data = peat,
pch = PCH[group], col = COL[group], bg = BGC[group],
xlab = XLAB, ylab = YLAB, xlim = XLIM, ylim = YLIM)
intercepts <- coef(fit3)[1] + c(0, coef(fit3)[3:5])
slopes <- coef(fit3)[2] + c(0, coef(fit3)[6:8])
for(g in 1:4){
abline(a = intercepts[g], b = slopes[g], col = DARK[g], lwd = 2)
}
legend("topleft", legend = levels(peat$group), ncol = 2, title = "Group",
col = COL, pch = PCH, pt.bg = BGC, lty = 1, lwd = 2)

We already know that the choice \(\mathbf{c} = \mathbf{e}_j = (0, \ldots, 0, 1, 0,
\ldots, 0)^\top\) with 1 on \(j\)-th position leads to \(\theta = \beta_j\). Two-sided tests for
\(\mathrm{H}_0: \theta = 0\) are
provided by the summary output:
(sumfit3 <- summary(fit3))
##
## Call:
## lm(formula = N ~ depth * group, data = peat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36829 -0.08469 -0.00789 0.06112 0.37713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.846956 0.051193 16.544 < 2e-16 ***
## depth 0.005048 0.005724 0.882 0.379
## groupCB-VJJ 0.322113 0.064417 5.000 1.68e-06 ***
## groupVJJ -0.064041 0.075374 -0.850 0.397
## groupVJJ-CB 0.075906 0.064417 1.178 0.241
## depth:groupCB-VJJ -0.032600 0.007389 -4.412 2.03e-05 ***
## depth:groupVJJ 0.043942 0.008312 5.287 4.68e-07 ***
## depth:groupVJJ-CB 0.041591 0.007389 5.629 9.56e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1354 on 140 degrees of freedom
## Multiple R-squared: 0.7306, Adjusted R-squared: 0.7171
## F-statistic: 54.23 on 7 and 140 DF, p-value: < 2.2e-16
The corresponding confidence intervals could be easily obtained:
confint(fit3)
## 2.5 % 97.5 %
## (Intercept) 0.745743956 0.94816772
## depth -0.006267384 0.01636428
## groupCB-VJJ 0.194757616 0.44946860
## groupVJJ -0.213059130 0.08497760
## groupVJJ-CB -0.051449561 0.20326143
## depth:groupCB-VJJ -0.047209047 -0.01799169
## depth:groupVJJ 0.027509039 0.06037546
## depth:groupVJJ-CB 0.026982539 0.05619989
How can we adjust the p-value to correspond to one-sided test?
cbind("less" = pt(sumfit3$coefficients[,"t value"], df = fit3$df.residual, lower.tail = T),
"greater" = pt(sumfit3$coefficients[,"t value"], df = fit3$df.residual, lower.tail = F))
## less greater
## (Intercept) 1.000000e+00 4.734139e-35
## depth 8.103674e-01 1.896326e-01
## groupCB-VJJ 9.999992e-01 8.420753e-07
## groupVJJ 1.984878e-01 8.015122e-01
## groupVJJ-CB 8.796731e-01 1.203269e-01
## depth:groupCB-VJJ 1.014731e-05 9.999899e-01
## depth:groupVJJ 9.999998e-01 2.338631e-07
## depth:groupVJJ-CB 1.000000e+00 4.781560e-08
What about the confidence intervals? The function
confint could be tricked to give you the finite bound of
the interval. However, one must realize the correct level
and to which test each bound corresponds.
bounds <- confint(fit3, level = 0.9) # 10% on both sides --> 5% on one side
bounds <- cbind(rep(-Inf, length(coef(fit3))), bounds[,2],
bounds[,1], rep(Inf, length(coef(fit3))))
colnames(bounds) <- c("less-lower", "less-upper",
"greater-lower", "greater-upper")
bounds
## less-lower less-upper greater-lower greater-upper
## (Intercept) -Inf 0.93172223 0.762189452 Inf
## depth -Inf 0.01452562 -0.004428721 Inf
## groupCB-VJJ -Inf 0.42877514 0.215451078 Inf
## groupVJJ -Inf 0.06076423 -0.188845757 Inf
## groupVJJ-CB -Inf 0.18256797 -0.030756098 Inf
## depth:groupCB-VJJ -Inf -0.02036540 -0.044835344 Inf
## depth:groupVJJ -Inf 0.05770530 0.030179203 Inf
## depth:groupVJJ-CB -Inf 0.05382619 0.029356242 Inf
What if a different value of \(\theta_0\) than zero is desired? We need to update the test statistic by subtracting \(\theta_0\) in the numerator. In the following, we again provide tests for all coefficients with \(\theta_0 = 1\):
theta0 <- 1
tt <- (sumfit3$coefficients[,"Estimate"] - theta0) / sumfit3$coefficients[,"Std. Error"]
cbind("two-sided" = 2*pt(-abs(tt), df = fit3$df.residual),
"less" = pt(tt, df = fit3$df.residual, lower.tail = T),
"greater" = pt(tt, df = fit3$df.residual, lower.tail = F))
## two-sided less greater
## (Intercept) 3.301432e-03 1.650716e-03 0.9983493
## depth 1.989703e-165 9.948513e-166 1.0000000
## groupCB-VJJ 1.926303e-19 9.631515e-20 1.0000000
## groupVJJ 1.073963e-28 5.369813e-29 1.0000000
## groupVJJ-CB 2.821148e-29 1.410574e-29 1.0000000
## depth:groupCB-VJJ 3.117835e-152 1.558917e-152 1.0000000
## depth:groupVJJ 1.698601e-140 8.493007e-141 1.0000000
## depth:groupVJJ-CB 9.833962e-148 4.916981e-148 1.0000000
Many coefficients have already some interpretation that compares the
groups. For example, the coefficient "groupVJJ" estimates
\(\mathsf{E}[N|D=0,G=\text{"VJJ"}] -
\mathsf{E}[N|D=0,G=\text{"CB"}]\), the expected
difference in nitrogen concentration between groups VJJ and
CB at zero depth.
What if we would like to compare different groups at zero depth?
Simple solution would be to change the reference category:
peat$group2 <- relevel(peat$group, ref = "VJJ")
summary(peat$group2) # VJJ became the reference category
## VJJ CB CB-VJJ VJJ-CB
## 33 35 40 40
summary(lm(N ~ depth * group2, data = peat))
##
## Call:
## lm(formula = N ~ depth * group2, data = peat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36829 -0.08469 -0.00789 0.06112 0.37713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.782915 0.055322 14.152 < 2e-16 ***
## depth 0.048991 0.006027 8.128 2.08e-13 ***
## group2CB 0.064041 0.075374 0.850 0.3970
## group2CB-VJJ 0.386154 0.067744 5.700 6.81e-08 ***
## group2VJJ-CB 0.139947 0.067744 2.066 0.0407 *
## depth:group2CB -0.043942 0.008312 -5.287 4.68e-07 ***
## depth:group2CB-VJJ -0.076543 0.007627 -10.036 < 2e-16 ***
## depth:group2VJJ-CB -0.002351 0.007627 -0.308 0.7583
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1354 on 140 degrees of freedom
## Multiple R-squared: 0.7306, Adjusted R-squared: 0.7171
## F-statistic: 54.23 on 7 and 140 DF, p-value: < 2.2e-16
Now the coefficients estimate the differences to category
VJJ. However, it required to refit the model again and
changed the interpretation of all other coefficients. This is
inadvisable, if many different comparisons are required.
Let’s go back to fit3 with group CB as the
reference category. Then, the difference between VJJ and
VJJ-CB at zero depth value could be expressed as \[
\mathsf{E}[N|D=0,G=\text{"VJJ"}] -
\mathsf{E}[N|D=0,G=\text{"VJJ-CB"}] = (\beta_1 + \beta_4) -
(\beta_1 + \beta_5) = \beta_4 - \beta_5.
\] Now we just need to realize, that this is just some linear
combination of the coefficients: \[
\beta_4 - \beta_5 = (0,0,0,1,-1,0,0,0)^\top \boldsymbol{\beta},
\] which inspires us to use special form of \(\mathbf{c} = (0,0,0,1,-1,0,0,0)^\top\).
Hypothesis \(\mathrm{H}_0: \beta_4 =
\beta_5\) then can be alternatively written as \[
\mathrm{H}_0: \theta := \mathbf{c}^\top \boldsymbol{\beta} = \beta_4 -
\beta_5 \qquad = 0.
\]
How do we do that with R?
comb <- c(0,0,0,1,-1,0,0,0)
theta0 <- 0
est.theta <- comb %*% coef(fit3)
var.theta <- t(comb) %*% vcov(fit3) %*% comb
se.theta <- sqrt(diag(var.theta))
t.theta <- (est.theta - theta0)/se.theta
# p-values
c("two-sided" = 2*pt(-abs(t.theta), df = fit3$df.residual),
"less" = pt(t.theta, df = fit3$df.residual, lower.tail = T),
"greater" = pt(t.theta, df = fit3$df.residual, lower.tail = F))
## two-sided less greater
## 0.04068962 0.02034481 0.97965519
# confidence interval
c("lower" = est.theta - qt(0.975, df = fit3$df.residual) * se.theta,
"upper" = est.theta + qt(0.975, df = fit3$df.residual) * se.theta)
## lower upper
## -0.273880280 -0.006013115
Alternatively, we can use function LSest from
mffSM package that can handle even multiple linear
combinations at the same time:
# the same test as before
LSest(fit3, comb, alternative = "two.sided", conf.level = 0.95, theta0 = 0)
## Estimate Std. Error t value P value Lower Upper
## 1 -0.1399467 0.06774403 -2.065816 0.04069 -0.2738803 -0.006013115
# multiple tests at the same time - fill the rows of matrix L with the linear combination
L <- matrix(c(0,0,1,0,-1,0,0,0,
0,0,0,1,-1,0,0,0,
0,0,1,-1,0,0,0,0), nrow = 3, byrow = T)
LSest(fit3, L = L)
## Estimate Std. Error t value P value Lower Upper
## 1 0.2462072 0.05529507 4.452606 1.7202e-05 0.1368858 0.355528506
## 2 -0.1399467 0.06774403 -2.065816 0.04069 -0.2738803 -0.006013115
## 3 0.3861539 0.06774403 5.700191 6.8138e-08 0.2522203 0.520087456
However, keep in mind that these p-values are not corrected for multiple testing.
The conditional expectation of the response is modeled as a linear combination of covariates. Therefore, its prediction for specific covariate values could be viewed as a specific linear combination of the estimated coefficients: \[ \theta = \mathsf{E}\, \left[Y_{\mathrm{new}} \mid \mathbf{X}_{\mathrm{new}} = \mathbf{x}_{\mathrm{new}} \right] = \mathbf{x}_{\mathrm{new}}^\top \boldsymbol{\beta} \quad\Longrightarrow\quad \mathbf{c} = \mathbf{x}_{\mathrm{new}}. \] Assume we wish to predict the expected Nitrogen concentration:
CB-VJJ at depth 3,VJJ at depth 7.The use of LSest function requires us to define the
linear combinations where one can easily make some mistake:
coef(fit3)
## (Intercept) depth groupCB-VJJ groupVJJ groupVJJ-CB depth:groupCB-VJJ
## 0.846955840 0.005048449 0.322113110 -0.064040764 0.075905933 -0.032600370
## depth:groupVJJ depth:groupVJJ-CB
## 0.043942252 0.041591216
L <- matrix(c(1, 3, 1, 0, 0, 3, 0, 0,
1, 7, 0, 1, 0, 0, 7, 0), nrow = 2, byrow = T)
LSest(fit3, L, theta0 = 1)
## Estimate Std. Error t value P value Lower Upper
## 1 1.086413 0.02842649 3.039882 0.0028253 1.030212 1.142614
## 2 1.125850 0.02485157 5.064065 1.2716e-06 1.076717 1.174983
It is much more convenient to use predict function for
such situation, we only need to supply a data.frame
containing all needed covariates:
newdata <- data.frame(depth = c(3,7), group = c("CB-VJJ", "VJJ"))
predict(fit3, newdata = newdata, level = 0.95, interval = "confidence")
## fit lwr upr
## 1 1.086413 1.030212 1.142614
## 2 1.125850 1.076717 1.174983
However, it only provides the point estimator and confidence interval, but not p-value. The function is not designed to perform statistical testing, only predictions.
Our task is to perform statistical test of hypothesis \[ \mathrm{H}_0: \mathsf{E} \left[Y \mid \mathbf{X} = \mathbf{x}_1 \right] = \mathsf{E} \left[Y \mid \mathbf{X} = \mathbf{x}_2 \right], \] where \(\mathbf{x}_1\) and \(\mathbf{x}_2\) are different covariate values. The null hypothesis could be rewritten as \[ \mathrm{H}_0: \mathsf{E} \left[Y \mid \mathbf{X} = \mathbf{x}_1 \right] - \mathsf{E} \left[Y \mid \mathbf{X} = \mathbf{x}_2 \right] = \mathbf{x}_1^\top \boldsymbol{\beta} - \mathbf{x}_2^\top \boldsymbol{\beta} = \left(\mathbf{x}_1- \mathbf{x}_2\right)^\top \boldsymbol{\beta} \qquad = 0. \] Then, all we need to do is to set \(\mathbf{c} = \mathbf{x}_1- \mathbf{x}_2\).
The test can be again performed by LSest or by manual
calculations. However, we need to carefully compute the vector
determining the linear combinations:
# Manually, where an error can easily occur...
L <- matrix(c(1, 3, 1, 0, 0, 3, 0, 0,
1, 7, 0, 1, 0, 0, 7, 0), nrow = 2, byrow = T)
(comb <- L[1,] - L[2,])
## [1] 0 -4 1 -1 0 3 -7 0
# Clever use of model.matrix()
newdata <- data.frame(depth = c(3,7), group = c("CB-VJJ", "VJJ"),
N = c(0,0)) # requires some arbitrary response values, otherwise error
L <- model.matrix(fit3, data = newdata)
(comb <- L[1,] - L[2,])
## (Intercept) depth groupCB-VJJ groupVJJ groupVJJ-CB depth:groupCB-VJJ
## 0 -4 1 -1 0 3
## depth:groupVJJ depth:groupVJJ-CB
## -7 0
LSest(fit3, comb)
## Estimate Std. Error t value P value Lower Upper
## 1 -0.03943679 0.03775799 -1.044462 0.29807 -0.1140864 0.03521279
Now, we are interested in predicting a new observation \(\mathbf{Y}_{\mathrm{new}} = \mathbf{x}_{\mathrm{new}}^\top \boldsymbol{\beta} + \varepsilon_\mathrm{new}\) given a set of new covariates \(\mathbf{X}_{\mathrm{new}} = \mathbf{x}_{\mathrm{new}}\) - the actual value of \(\mathbf{Y}_{\mathrm{new}}\), not just \(\mathsf{E}\, \left[\mathbf{Y}_{\mathrm{new}} | \mathbf{X}_{\mathrm{new}} = \mathbf{x}_{\mathrm{new}}\right] = \mathbf{x}_{\mathrm{new}}^\top \boldsymbol{\beta}\). The point estimator remains unchanged since the expected value of \(\varepsilon_{\mathrm{new}}\) is zero. Huge change comes happens with the width of the prediction intervals compared to the confidence intervals.
Since training data \(\mathbf{Y},
\mathbb{X}\) and \(\varepsilon_{\mathrm{new}}\) are assumed to
be independent, we can combine two random variables together and obtain:
\[
\left.
\mathbf{x}_{\mathrm{new}}^\top \widehat{\boldsymbol{\beta}} -
\varepsilon_{\mathrm{new}}
\,\right|\, \mathbb{X}
\sim \mathsf{N}\left(
\mathbf{x}_{\mathrm{new}}^\top\boldsymbol{\beta} + 0,\quad
\sigma^2 \mathbf{x}_{\mathrm{new}}^\top \left(\mathbb{X}^\top
\mathbb{X}\right)^{-1} \mathbf{x}_{\mathrm{new}}
\;+\; \sigma^2
\right)
\] and consequently \[
\left.
\frac{
\mathbf{x}_{\mathrm{new}}^\top \widehat{\boldsymbol{\beta}} -
Y_{\mathrm{new}}
}{
\sqrt{
\widehat{\sigma}^2
\left(1 + \mathbf{x}_{\mathrm{new}}^\top \left(\mathbb{X}^\top
\mathbb{X}\right)^{-1} \mathbf{x}_{\mathrm{new}}\right)}
}
=
\frac{
\mathbf{x}_{\mathrm{new}}^\top \widehat{\boldsymbol{\beta}} -
\varepsilon_{\mathrm{new}} -
\mathbf{x}_\mathrm{new}^\top\boldsymbol{\beta}
}{
\sqrt{
\frac{\mathsf{RSS}}{n-p}
\frac{\sigma^2}{\sigma^2}
\left(1 + \mathbf{x}_{\mathrm{new}}^\top \left(\mathbb{X}^\top
\mathbb{X}\right)^{-1} \mathbf{x}_{\mathrm{new}}\right)}
}
\,\right|\, \mathbb{X}
\sim \mathsf{t}_{n-p}.
\] This result is used to construct the prediction intervals:
\[
\mathsf{P} \left(
\mathbf{x}_{\mathrm{new}}^\top \widehat{\boldsymbol{\beta}}
\pm
\mathsf{t}_{n-p}\left(1-\frac{\alpha}{2}\right)\sqrt{\widehat{\sigma}^2\left[1
+ \mathbf{x}_{\mathrm{new}}^\top \left(\mathbb{X}^\top
\mathbb{X}\right)^{-1} \mathbf{x}_{\mathrm{new}}\right]}
\;\ni Y_{\mathrm{new}}
\right) = 1-\alpha.
\] The prediction intervals are provided by predict
function with the change of the interval parameter to
prediction:
newdata <- data.frame(depth = c(3,7), group = c("CB-VJJ", "VJJ"))
predict(fit3, newdata = newdata, level = 0.95, interval = "confidence")
## fit lwr upr
## 1 1.086413 1.030212 1.142614
## 2 1.125850 1.076717 1.174983
predict(fit3, newdata = newdata, level = 0.95, interval = "prediction")
## fit lwr upr
## 1 1.086413 0.8127977 1.360029
## 2 1.125850 0.8535983 1.398102
The prediction intervals could be computed for wide range of \(\mathbf{x}_{\mathrm{new}}\) values. These prediction bands should cover \(100(1-\alpha)\) % of observations. First, let us visualize this with one of the groups:
g <- "CB-VJJ"
xgrid <- seq(0, 14, length.out = 101)
conf_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "confidence", level = 0.95)
pred_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "prediction", level = 0.95)
par(mfrow = c(1,1), mar = c(4,4,2,1))
plot(N ~ depth, data = peat[peat$group == g,],
main = paste("Group", g),
pch = PCH[g], col = COL[g], bg = BGC[g],
xlab = XLAB, ylab = YLAB, xlim = XLIM, ylim = YLIM)
lines(xgrid, pred_g[,"fit"], col = COL[g], lwd = 2, lty = 1)
lines(xgrid, pred_g[,"lwr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, pred_g[,"upr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, conf_g[,"lwr"], col = "blue", lwd = 2, lty = 3)
lines(xgrid, conf_g[,"upr"], col = "blue", lwd = 2, lty = 3)
# Where is the band narrowest?
abline(v = mean(peat$depth[peat$group == g]), col = "grey")
abline(h = mean(peat$N[peat$group == g]), col = "grey")
legend("topright", legend = c("Point estimator", "Prediction CI", "Confidence CI"),
col = c(COL[g], DARK[g], "blue"), lwd = 2, lty = 1:3, bty = "n", ncol = 3)
And now with all groups together:
xgrid <- seq(0, 14, length.out = 101)
par(mfrow = c(1,1), mar = c(4,4,2,1))
plot(N ~ depth, data = peat,
main = "Prediction bands for all groups",
pch = PCH[group], col = COL[group], bg = BGC[group],
xlab = XLAB, ylab = YLAB, xlim = XLIM, ylim = YLIM)
for(g in levels(peat$group)){
conf_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "confidence", level = 0.95)
pred_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "prediction", level = 0.95)
polygon(c(xgrid, rev(xgrid)), c(pred_g[,"lwr"], rev(pred_g[,"upr"])),
border = DARK[g], col = TRC[g], lty = 2)
lines(xgrid, pred_g[,"fit"], col = COL[g], lwd = 2, lty = 1)
}
legend("topleft", legend = levels(peat$group),
col = COL, lwd = 2, lty = 2, bty = "n", ncol = 2)

Let us remind the distribution of \(\widehat{\boldsymbol{\beta}}\) under normality assumption in the full-rank model: \[ \left. \dfrac{\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}}{\sqrt{\sigma^2}} \,\right|\, \mathbb{X} \sim \mathsf{N}_p\left(\mathbf{0}_p, \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\right). \] Since the variance matrix is symmetric and invertible, we can apply Cholesky decomposition to get \(\left(\mathbb{X}^\top \mathbb{X}\right)^{-1} = \left(\mathbb{X}^\top \mathbb{X}\right)^{-\frac{1}{2}} \left(\left(\mathbb{X}^\top \mathbb{X}\right)^{-\frac{1}{2}}\right)^\top\) and find the inverse \(\left(\mathbb{X}^\top \mathbb{X}\right)^{\frac{1}{2}}\). Then, \[ \left. \dfrac{1}{\sqrt{\sigma^2}}\left(\mathbb{X}^\top \mathbb{X}\right)^{\frac{1}{2}} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \,\right|\, \mathbb{X} \sim \mathsf{N}_p\left(\mathbf{0}_p, \left(\mathbb{X}^\top \mathbb{X}\right)^{\frac{1}{2}} \left(\mathbb{X}^\top \mathbb{X}\right)^{-1} \left(\left(\mathbb{X}^\top \mathbb{X}\right)^{\frac{1}{2}}\right)^\top\right) = \mathsf{N}_p\left(\mathbf{0}_p, \mathbb{I}_n\right) . \] Therefore, summing its squares yields \(\chi^2\) distribution: \[ \left. \frac{1}{\sigma^2} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\top \left(\mathbb{X}^\top \mathbb{X}\right) \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \,\right|\, \mathbb{X} \sim \chi_p^2, \] which is still independent of \(\mathsf{RSS}\). Hence, \[ \left. \dfrac{ \dfrac{ \frac{1}{\sigma^2} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\top \left(\mathbb{X}^\top \mathbb{X}\right) \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)\sim \chi_p^2 }{ p } }{ \dfrac{ \frac{1}{\sigma^2}\mathsf{RSS} \sim \chi_{n-p}^2 }{ (n-p) } } = \frac{1}{p} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\top \left[\frac{1}{\widehat{\sigma}^2} \mathbb{X}^\top \mathbb{X}\right] \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \,\right|\, \mathbb{X} \quad \sim \quad \mathsf{F}_{p,n-p}, \] which can be used for constructing confidence regions for simultaneous inference: \[ \mathsf{CR}(1-\alpha) = \left\{ \boldsymbol{\beta} \in \mathbb{R}^p: \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\top \left[\frac{1}{\widehat{\sigma}^2} \mathbb{X}^\top \mathbb{X}\right] \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \leq p \,\mathsf{F}_{p,n-p}(1-\alpha) \right\}, \] which satisfies \(\mathsf{P}\left(\mathsf{CR}(1-\alpha) \ni \boldsymbol{\beta}_0\right) = 1-\alpha\) for the true value \(\boldsymbol{\beta}_0\). Clearly, \(\mathsf{CR}(1-\alpha)\) is an ellipsoid with
In R we can use confidenceEllipse function (see
?ellipse for other usages) from package car
which automatically extracts required parameters to plot an ellipse.
However, only 2D ellipse could be plotted, so only two coefficients
could be chosen at a time. Let’s compare the confidence regions (blue)
with rectangles made by confidence intervals (red).
CIs <- confint(fit3)
par(mfrow = c(2,2), mar = c(4,4,0.5,0.5))
confidenceEllipse(fit3, which.coef = 1:2)
abline(v = CIs[1,], col = "red", lty = 2)
abline(h = CIs[2,], col = "red", lty = 2)
confidenceEllipse(fit3, which.coef = 3:4)
abline(v = CIs[3,], col = "red", lty = 2)
abline(h = CIs[4,], col = "red", lty = 2)
confidenceEllipse(fit3, which.coef = 5:6)
abline(v = CIs[5,], col = "red", lty = 2)
abline(h = CIs[6,], col = "red", lty = 2)
confidenceEllipse(fit3, which.coef = 7:8)
abline(v = CIs[7,], col = "red", lty = 2)
abline(h = CIs[8,], col = "red", lty = 2)

The result from the previous part could be extended even further. We know that \[ \mathsf{P}\left[ \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right)^\top \left[\frac{1}{\widehat{\sigma}^2} \mathbb{X}^\top \mathbb{X}\right] \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \leq p \,\mathsf{F}_{p,n-p}(1-\alpha) \right] = 1-\alpha \] if \(\boldsymbol{\beta}_0\) is the true value of the parameter \(\boldsymbol{\beta}\). We wish to extend the result for any linear combination \(\mathbf{x}^\top\boldsymbol{\beta}\). Let us multiply both sides of the inequality with an estimate of the asymptotic variance from above: \(\widehat{\sigma}^2 \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x}\) for any \(\mathbf{x}\in\mathbb{R}^p\). For \(\mathbf{x} = \mathbf{0}\) the inequality trivially holds, for other the estimate of the variance has to be positive (from positive definiteness). The left hand side now can be rearranged with the use of trace: \[\begin{align*} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right)^\top \mathbb{X}^\top \mathbb{X} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \cdot \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x} &= \mathsf{Tr}\left[ \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right)^\top \mathbb{X}^\top \mathbb{X} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \cdot \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x} \right] \\ &= \mathsf{Tr}\left[ \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\left(\mathbb{X}^\top \mathbb{X}\right) \mathbf{x} \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right)^\top \right] \\ &= \mathsf{Tr}\left[ \mathbf{x}^\top\left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right)^\top \mathbf{x} \right] \\ &= \left[ \mathbf{x}^\top\left(\widehat{\boldsymbol{\beta}} - \boldsymbol{\beta}_0\right) \right]^2 \\ &= \left[ \mathbf{x}^\top\widehat{\boldsymbol{\beta}} - \mathbf{x}^\top\boldsymbol{\beta}_0 \right]^2 \end{align*}\] Therefore, \[\begin{align*} \mathsf{P}\left[ \forall \mathbf{x}\in\mathbb{R}^p: \left[ \mathbf{x}^\top\widehat{\boldsymbol{\beta}} - \mathbf{x}^\top\boldsymbol{\beta}_0 \right]^2 \leq p \,\mathsf{F}_{p,n-p}(1-\alpha) \, \widehat{\sigma}^2 \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x} \right] &= 1-\alpha \\ \mathsf{P}\left[ \forall \mathbf{x}\in\mathbb{R}^p: \left| \mathbf{x}^\top\widehat{\boldsymbol{\beta}} - \mathbf{x}^\top\boldsymbol{\beta}_0 \right| \leq \sqrt{p \,\mathsf{F}_{p,n-p}(1-\alpha) \, \widehat{\sigma}^2 \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x}} \right] &= 1-\alpha \\ \mathsf{P}\left[ \forall \mathbf{x}\in\mathbb{R}^p: \mathbf{x}^\top\widehat{\boldsymbol{\beta}} \pm \sqrt{p \,\mathsf{F}_{p,n-p}(1-\alpha) \, \widehat{\sigma}^2 \mathbf{x}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{x}} \quad \ni \mathbf{x}^\top\boldsymbol{\beta}_0 \right] &= 1-\alpha. \end{align*}\] This will be called confidence band FOR the regression function.
On the other hand we have confidence band AROUND the regression function which is based on pointwise coverage provided by the result for any \(\theta = \mathbf{c}^\top \boldsymbol{\beta}\): \[ \mathsf{P}\left[ \mathbf{c}^\top\widehat{\boldsymbol{\beta}} \pm \mathsf{t}_{n-p}\left(1-\frac{\alpha}{2}\right) \, \sqrt{\widehat{\sigma}^2 \mathbf{c}^\top \left(\mathbb{X}^\top \mathbb{X}\right)^{-1}\mathbf{c}} \quad \ni \mathbf{c}^\top\boldsymbol{\beta}_0 \right] = 1-\alpha. \] Since \(\mathsf{t}_{n-p}^2 = \mathsf{F}_{1, n-p}\), the difference in used quantiles is: \[ \mathrm{AROUND}: \mathsf{t}_{n-p}\left(1-\frac{\alpha}{2}\right) =\sqrt{\mathsf{F}_{1, n-p}(1-\alpha)} \qquad \text{vs.} \qquad \mathrm{FOR}: \sqrt{p\,\mathsf{F}_{p, n-p}(1-\alpha)}, \] where the second one is always larger. In our case we have
(qaround <- qt(0.975, df = fit3$df.residual))
## [1] 1.977054
sqrt(qf(0.95, df1 = 1, df2 = fit3$df.residual))
## [1] 1.977054
(qfor <- sqrt(fit3$rank * qf(0.95, df1 = fit3$rank, df2 = fit3$df.residual)))
## [1] 4.005126
Let us compute the bands for one of the groups first:
g <- "CB-VJJ"
xgrid <- seq(0, 14, length.out = 101)
band_around_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "confidence", level = 0.95, se.fit = TRUE)
pred_g <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "prediction", level = 0.95)
# Manual computations: confidence band FOR regression function (not provided by predict!)
band_for_g <- data.frame(fit = band_around_g$fit[,"fit"],
lwr = band_around_g$fit[,"fit"] - qfor * band_around_g$se.fit,
upr = band_around_g$fit[,"fit"] + qfor * band_around_g$se.fit)
# Plot the bands:
par(mfrow = c(1,1), mar = c(4,4,2,1))
plot(N ~ depth, data = peat[peat$group == g,],
main = paste("Group", g),
pch = PCH[g], col = COL[g], bg = BGC[g],
xlab = XLAB, ylab = YLAB, xlim = XLIM, ylim = YLIM)
lines(xgrid, pred_g[,"fit"], col = COL[g], lwd = 2, lty = 1)
lines(xgrid, pred_g[,"lwr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, pred_g[,"upr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, band_around_g$fit[,"lwr"], col = "blue", lwd = 2, lty = 3)
lines(xgrid, band_around_g$fit[,"upr"], col = "blue", lwd = 2, lty = 3)
lines(xgrid, band_for_g[,"lwr"], col = "red", lwd = 2, lty = 4)
lines(xgrid, band_for_g[,"upr"], col = "red", lwd = 2, lty = 4)
# Where is the band narrowest?
abline(v = mean(peat$depth[peat$group == g]), col = "grey")
abline(h = mean(peat$N[peat$group == g]), col = "grey")
legend("topright", legend = c("Point estimator", "Prediction band", "Band AROUND", "Band FOR"),
col = c(COL[g], DARK[g], "blue", "red"), lwd = 2, lty = 1:4, bty = "n", ncol = 2)
However, the band FOR the regression function holds for any group. The
same model gives us several regression lines each with its own set of
bands:
xgrid <- seq(0, 14, length.out = 101)
band_around <- pred <- band_for <- list()
par(mfrow = c(2,2), mar = c(4,4,2,1))
for(g in levels(peat$group)){
band_around[[g]] <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "confidence", level = 0.95, se.fit = TRUE)
pred[[g]] <- predict(fit3, newdata = data.frame(group = g, depth = xgrid),
interval = "prediction", level = 0.95)
# Manual computations: confidence band FOR regression function (not provided by predict!)
band_for[[g]] <- data.frame(fit = band_around[[g]]$fit[,"fit"],
lwr = band_around[[g]]$fit[,"fit"] - qfor * band_around[[g]]$se.fit,
upr = band_around[[g]]$fit[,"fit"] + qfor * band_around[[g]]$se.fit)
# Plot the bands:
plot(N ~ depth, data = peat[peat$group == g,],
main = paste("Group", g),
pch = PCH[g], col = COL[g], bg = BGC[g],
xlab = XLAB, ylab = YLAB, xlim = XLIM, ylim = YLIM)
lines(xgrid, pred[[g]][,"fit"], col = COL[g], lwd = 2, lty = 1)
lines(xgrid, pred[[g]][,"lwr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, pred[[g]][,"upr"], col = DARK[g], lwd = 2, lty = 2)
lines(xgrid, band_around[[g]]$fit[,"lwr"], col = "blue", lwd = 2, lty = 3)
lines(xgrid, band_around[[g]]$fit[,"upr"], col = "blue", lwd = 2, lty = 3)
lines(xgrid, band_for[[g]][,"lwr"], col = "red", lwd = 2, lty = 4)
lines(xgrid, band_for[[g]][,"upr"], col = "red", lwd = 2, lty = 4)
}
