We will need library("nlme")
to access functions for longitudinal data analysis and library("lattice")
for high-level plotting functions.
The examples use dataset Orthodont available in the package nlme
. Access it by calling data(Orthodont, package = "nlme")
.
This data frame contains the following columns:
distance
A numeric vector of distances (mm) from the pituitary (hypophysis gland) to the pterygomaxillary fissure (see the picture below). These distances are measured on x-ray images of the skull.age
A numeric vector of ages of the subject (yr).Subject
An ordered factor indicating the subject on which the measurement was made. The levels are labelled M01
to M16
for the males and F01
to F13
for the females. The ordering is by increasing average distance within sex.Sex
A factor with levels Male
and Female
.There are four measurements of distance
on each subject, obtained at ages 8, 10, 12, and 14. The goal of the analysis is to describe how the distance changes with age, how the growth varies between subjects and how it depends on gender.
As you can see by checking
str(Orthodont)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 108 obs. of 4 variables:
## $ distance: num 26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
## $ age : num 8 10 12 14 8 10 12 14 8 10 ...
## $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "outer")=Class 'formula' language ~Sex
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## - attr(*, "formula")=Class 'formula' language distance ~ age | Subject
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Age"
## ..$ y: chr "Distance from pituitary to pterygomaxillary fissure"
## - attr(*, "units")=List of 2
## ..$ x: chr "(yr)"
## ..$ y: chr "(mm)"
## - attr(*, "FUN")=function (x)
## ..- attr(*, "source")= chr "function (x) max(x, na.rm = TRUE)"
## - attr(*, "order.groups")= logi TRUE
head(Orthodont)
## Grouped Data: distance ~ age | Subject
## distance age Subject Sex
## 1 26.0 8 M01 Male
## 2 25.0 10 M01 Male
## 3 29.0 12 M01 Male
## 4 31.0 14 M01 Male
## 5 21.5 8 M02 Male
## 6 22.5 10 M02 Male
the dataset has an additional special structure. It is a groupedData
object. Let us explain how such objects can be constructed.
Remove the groupedData structure from Orthodont by creating a classical data.frame called Ort
:
Ort <- as.data.frame(Orthodont)
str(Ort)
## 'data.frame': 108 obs. of 4 variables:
## $ distance: num 26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
## $ age : num 8 10 12 14 8 10 12 14 8 10 ...
## $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
head(Ort)
## distance age Subject Sex
## 1 26.0 8 M01 Male
## 2 25.0 10 M01 Male
## 3 29.0 12 M01 Male
## 4 31.0 14 M01 Male
## 5 21.5 8 M02 Male
## 6 22.5 10 M02 Male
Now recreate a grouped data object it by running the command
Ort2 <- groupedData(formula = distance ~ age | Subject,
data = Ort, outer = ~Sex,
labels = list(x = "Age", y = "Distance from pituitary to pterygomaxillary fissure"),
units = list(x = "[yr]", y = "[mm]"))
str(Ort2)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 108 obs. of 4 variables:
## $ distance: num 26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
## $ age : num 8 10 12 14 8 10 12 14 8 10 ...
## $ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3 3 3 3 7 7 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "formula")=Class 'formula' language distance ~ age | Subject
## .. ..- attr(*, ".Environment")=<environment: 0x906d1a8>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Age"
## ..$ y: chr "Distance from pituitary to pterygomaxillary fissure"
## - attr(*, "units")=List of 2
## ..$ x: chr "[yr]"
## ..$ y: chr "[mm]"
## - attr(*, "outer")=Class 'formula' language ~Sex
## .. ..- attr(*, ".Environment")=<environment: 0x906d1a8>
## - attr(*, "FUN")=function (x)
## - attr(*, "order.groups")= logi TRUE
head(Ort2)
## Grouped Data: distance ~ age | Subject
## <environment: 0x906d1a8>
## distance age Subject Sex
## 1 26.0 8 M01 Male
## 2 25.0 10 M01 Male
## 3 29.0 12 M01 Male
## 4 31.0 14 M01 Male
## 5 21.5 8 M02 Male
## 6 22.5 10 M02 Male
This call to the groupedData
function specifies the following:
formula
defines the grouping structure, the response variable and the primary covariate (usually time in case of longitudinal data). The groups are specified by listing the variable Subject
after the vertical bar at the end of the formula. The primary covariate in this case is age.outer
defines additional outer covariates of interest. Outer covariate is a covariate measured at the group level (all measurements within the same group share the same value of the outer covariate). [The opposite: inner covariate.]labels
and units
arguments specify the labels and units for the response (component y
) and the primary covariate (component x
). These arguments are optional.See also help(groupedData)
. Functions available in library("nlme")
are able to use the structure specified by the groupedData
object.
The standard R functions for descriptive analysis may not be useful for describing grouped data because they do not take into account the group structure. Problems may be caused, e.g., by the repetition of the values of outer variables on each subject. For example:
summary(Ort)
## distance age Subject Sex
## Min. :16.50 Min. : 8.0 M16 : 4 Male :64
## 1st Qu.:22.00 1st Qu.: 9.5 M05 : 4 Female:44
## Median :23.75 Median :11.0 M02 : 4
## Mean :24.02 Mean :11.0 M11 : 4
## 3rd Qu.:26.00 3rd Qu.:12.5 M07 : 4
## Max. :31.50 Max. :14.0 M08 : 4
## (Other):84
Instead of using summary
, one can use gsummary
on groupedData
object to see summaries of variables by group. Continuous variables are summarized by mean, which can be changed by providing the argument FUN
.
gsummary(Ort2)
## distance age Subject Sex
## M16 23.000 11 M16 Male
## M05 23.000 11 M05 Male
## M02 23.375 11 M02 Male
## M11 23.625 11 M11 Male
## M07 23.750 11 M07 Male
## M08 23.875 11 M08 Male
## M03 24.250 11 M03 Male
## M12 24.250 11 M12 Male
## M13 24.250 11 M13 Male
## M14 24.875 11 M14 Male
## M09 25.125 11 M09 Male
## M15 25.875 11 M15 Male
## M06 26.375 11 M06 Male
## M04 26.625 11 M04 Male
## M01 27.750 11 M01 Male
## M10 29.500 11 M10 Male
## F10 18.500 11 F10 Female
## F09 21.125 11 F09 Female
## F06 21.125 11 F06 Female
## F01 21.375 11 F01 Female
## F05 22.625 11 F05 Female
## F07 23.000 11 F07 Female
## F02 23.000 11 F02 Female
## F08 23.375 11 F08 Female
## F03 23.750 11 F03 Female
## F04 24.875 11 F04 Female
## F11 26.375 11 F11 Female
gsummary(Ort2, FUN = max)
## distance age Subject Sex
## M16 25.0 14 M16 Male
## M05 26.0 14 M05 Male
## M02 26.5 14 M02 Male
## M11 25.0 14 M11 Male
## M07 26.5 14 M07 Male
## M08 25.5 14 M08 Male
## M03 27.5 14 M03 Male
## M12 28.0 14 M12 Male
## M13 29.5 14 M13 Male
## M14 26.0 14 M14 Male
## M09 31.0 14 M09 Male
## M15 30.0 14 M15 Male
## M06 28.5 14 M06 Male
## M04 27.5 14 M04 Male
## M01 31.0 14 M01 Male
## M10 31.5 14 M10 Male
## F10 19.5 14 F10 Female
## F09 22.0 14 F09 Female
## F06 22.5 14 F06 Female
## F01 23.0 14 F01 Female
## F05 23.5 14 F05 Female
## F07 25.0 14 F07 Female
## F02 25.5 14 F02 Female
## F08 24.0 14 F08 Female
## F03 26.0 14 F03 Female
## F04 26.5 14 F04 Female
## F11 28.0 14 F11 Female
This output can be further summarized across all groups by the function summary
:
summary(gsummary(Ort2))
## distance age Subject Sex
## Min. :18.50 Min. :11 M16 : 1 Male :16
## 1st Qu.:23.00 1st Qu.:11 M05 : 1 Female:11
## Median :23.75 Median :11 M02 : 1
## Mean :24.02 Mean :11 M11 : 1
## 3rd Qu.:25.00 3rd Qu.:11 M07 : 1
## Max. :29.50 Max. :11 M08 : 1
## (Other):21
Compare this to the output of
summary(Ort2)
## distance age Subject Sex
## Min. :16.50 Min. : 8.0 M16 : 4 Male :64
## 1st Qu.:22.00 1st Qu.: 9.5 M05 : 4 Female:44
## Median :23.75 Median :11.0 M02 : 4
## Mean :24.02 Mean :11.0 M11 : 4
## 3rd Qu.:26.00 3rd Qu.:12.5 M07 : 4
## Max. :31.50 Max. :14.0 M08 : 4
## (Other):84
The package nlme
adds methods for creating subject-level plots of groupedData
objects.
plot(Ort2)
Note that such plot is only useful if the number of subjects is not too high. With a high number of subjects, user-tailored plots must be used! Another strategy is to select (randomly) a reasonable number of subjects and then to create plots for only this subset.
A two-panel plot by gender is obtained by calling
plot(Ort2, outer = TRUE, aspect = "fill")
In case there are more subject-specific covariates in the data, only some of them can be selected to split the plot:
plot(Ort2, outer = ~Sex, aspect = "fill")
For more details, see help(plot.nfnGroupedData)
.
It is not appropriate to analyze longitudinal and in general grouped data by a linear model using least squares. Nevertheless, we will still try this model to compare the results with the analysis by the linear mixed model we will consider later. Fitting a classical linear model can (sometimes!!!) be useful in an exploratory part of the analysis. A classical linear model is useful exploratory tool in two ways:
Using ordinary least squares (OLS) applied to the full data set while ignoring dependencies. This basically means fitting a marginal model in which we (mostly incorrectly) assume homoscedastic and independent errors. This is especially useful to decide (at least preliminary) on the fixed effects
part of the model based on residual plots used for a classical linear model. Nevertheless, do not forget that formal tests are in general invalid due to correlated errors which are also not necessarily homoscedastic.
Using OLS applied separately to each subject. This is especially useful to decide (again at least preliminary) on the random effects
part of the model. When doing so, we can
Use residuals from point 1 as outcome (= detrended observations);
Assume (and estimate) common residual variance for all subjects (which is basically achieved by considering Subject
as a categorical covariate (factor
) in the linear model).
In case there are no time-dependent regressors except time and no or only few (and categorical) outer
regressors, steps 1 and 2 can be done almost together.
To obtain a meaningful interpretation of the intercept, we subtract 8 (the age at the initial measurement) from age. Then the intercept estimates the expected value at the start of the study.
Let us first fit a classical linear model while assuming a linear trend over time with possibly different parameters of the line for male and female.
fitlm <- lm(distance ~ I(age - 8) * Sex, data = Ort2)
summary(fitlm)
##
## Call:
## lm(formula = distance ~ I(age - 8) * Sex, data = Ort2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6156 -1.3219 -0.1682 1.3299 5.2469
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.6156 0.4721 47.907 < 2e-16 ***
## I(age - 8) 0.7844 0.1262 6.217 1.07e-08 ***
## SexFemale -1.4065 0.7396 -1.902 0.060 .
## I(age - 8):SexFemale -0.3048 0.1977 -1.542 0.126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.257 on 104 degrees of freedom
## Multiple R-squared: 0.4227, Adjusted R-squared: 0.4061
## F-statistic: 25.39 on 3 and 104 DF, p-value: 2.108e-12
Classical basic residual plots:
par(mfrow = c(2, 2))
plot(fitlm, pch = 23, col = "darkblue", bg = "skyblue")
par(mfrow = c(1, 1))
Especially from the first plot we can conclude that the linear trend with parameters of the lines depending on Sex
is a reasonable model for the marginal evolution of the response, i.e., for the fixed effects
part of the subsequent linear mixed model.
One more classical plot (mainly to see how the trends for male and female differ):
COL <- c("darkblue", "red")
BG <- c("skyblue", "pink")
PCH <- c(24, 25)
names(COL) <- names(BG) <- names(PCH) <- c("Male", "Female")
plot(distance ~ age, col = COL[Sex], pch = PCH[Sex], bg = BG[Sex], data = Ort2)
abline(lm(distance ~ age, data = subset(Ort2, Sex == "Female")), col="red", lwd = 2)
abline(lm(distance ~ age, data = subset(Ort2, Sex == "Male")), col = "blue", lwd = 2)
However, the residuals for measurements taken on the same subject are correlated, as the following plot testifies.
bwplot(Ort2$Subject ~ residuals(fitlm))
But we should not be surprised by that…
Let us now explore subject-specific evolutions of the response over time. In our dataset, we have just one outer
covariate (Sex
) which, moreover, has only two levels. Hence it is not really necessary to detrend the outcomes before fitting subject-specific models. We can easily take Sex
into account at later stage as will be shown.
We will now consider fitting subject-specific regression lines to these data. At the same time, we will assume the same residual variance for all subjects. The subject-specific regression lines are obtained from the function lmList
.
fitslm <- lmList(Ort2)
The information on how to identify the subjects and what models to fit is extracted from the groupedData
structure of Ort2
. The structure of the model can also be specified explicitely (and must be specified in this way if a simpler linear model is to be fitted than that implied by the groupedData
structure). The lmList
function can also be used with data stored in a standard data.frame
:
fitslm <- lmList(distance ~ age | Subject, data = Ort)
The estimated subject-specific coefficients (intercepts and slopes) can be summarized and the residuals plotted easily.
summary(fitslm)
## Call:
## Model: distance ~ age | Subject
## Data: Ort
##
## Coefficients:
## (Intercept)
## Estimate Std. Error t value Pr(>|t|)
## M16 16.95 3.288173 5.1548379 3.695247e-06
## M05 13.65 3.288173 4.1512411 1.181678e-04
## M02 14.85 3.288173 4.5161854 3.458934e-05
## M11 20.05 3.288173 6.0976106 1.188838e-07
## M07 14.95 3.288173 4.5465974 3.116705e-05
## M08 19.75 3.288173 6.0063745 1.665712e-07
## M03 16.00 3.288173 4.8659237 1.028488e-05
## M12 13.25 3.288173 4.0295930 1.762580e-04
## M13 2.80 3.288173 0.8515366 3.982319e-01
## M14 19.10 3.288173 5.8086964 3.449588e-07
## M09 14.40 3.288173 4.3793313 5.509579e-05
## M15 13.50 3.288173 4.1056231 1.373664e-04
## M06 18.95 3.288173 5.7630783 4.078189e-07
## M04 24.70 3.288173 7.5117696 6.081644e-10
## M01 17.30 3.288173 5.2612799 2.523621e-06
## M10 21.25 3.288173 6.4625549 3.065505e-08
## F10 13.55 3.288173 4.1208291 1.306536e-04
## F09 18.10 3.288173 5.5045761 1.047769e-06
## F06 17.00 3.288173 5.1700439 3.499774e-06
## F01 17.25 3.288173 5.2460739 2.665260e-06
## F05 19.60 3.288173 5.9607565 1.971127e-07
## F07 16.95 3.288173 5.1548379 3.695247e-06
## F02 14.20 3.288173 4.3185072 6.763806e-05
## F08 21.45 3.288173 6.5233789 2.443813e-08
## F03 14.40 3.288173 4.3793313 5.509579e-05
## F04 19.65 3.288173 5.9759625 1.863600e-07
## F11 18.95 3.288173 5.7630783 4.078189e-07
## age
## Estimate Std. Error t value Pr(>|t|)
## M16 0.550 0.2929338 1.8775576 6.584707e-02
## M05 0.850 0.2929338 2.9016799 5.361639e-03
## M02 0.775 0.2929338 2.6456493 1.065760e-02
## M11 0.325 0.2929338 1.1094659 2.721458e-01
## M07 0.800 0.2929338 2.7309929 8.511442e-03
## M08 0.375 0.2929338 1.2801529 2.059634e-01
## M03 0.750 0.2929338 2.5603058 1.328807e-02
## M12 1.000 0.2929338 3.4137411 1.222240e-03
## M13 1.950 0.2929338 6.6567951 1.485652e-08
## M14 0.525 0.2929338 1.7922141 7.870160e-02
## M09 0.975 0.2929338 3.3283976 1.577941e-03
## M15 1.125 0.2929338 3.8404587 3.247135e-04
## M06 0.675 0.2929338 2.3042752 2.508117e-02
## M04 0.175 0.2929338 0.5974047 5.527342e-01
## M01 0.950 0.2929338 3.2430540 2.030113e-03
## M10 0.750 0.2929338 2.5603058 1.328807e-02
## F10 0.450 0.2929338 1.5361835 1.303325e-01
## F09 0.275 0.2929338 0.9387788 3.520246e-01
## F06 0.375 0.2929338 1.2801529 2.059634e-01
## F01 0.375 0.2929338 1.2801529 2.059634e-01
## F05 0.275 0.2929338 0.9387788 3.520246e-01
## F07 0.550 0.2929338 1.8775576 6.584707e-02
## F02 0.800 0.2929338 2.7309929 8.511442e-03
## F08 0.175 0.2929338 0.5974047 5.527342e-01
## F03 0.850 0.2929338 2.9016799 5.361639e-03
## F04 0.475 0.2929338 1.6215270 1.107298e-01
## F11 0.675 0.2929338 2.3042752 2.508117e-02
##
## Residual standard error: 1.31004 on 54 degrees of freedom
plot(fitslm)
However, we would rather adjust the intercept to the expected distance at the age 8, so we redefine the fitted model as follows:
fitslm2 <- lmList(distance ~ I(age - 8), data = Ort2)
or
fitslm2 <- lmList(distance ~ I(age - 8) | Subject, data = Ort)
summary(fitslm2)
## Call:
## Model: distance ~ I(age - 8) | Subject
## Data: Ort
##
## Coefficients:
## (Intercept)
## Estimate Std. Error t value Pr(>|t|)
## M16 21.35 1.096058 19.47890 4.359278e-26
## M05 20.45 1.096058 18.65778 3.319674e-25
## M02 21.05 1.096058 19.20519 8.514661e-26
## M11 22.65 1.096058 20.66497 2.596148e-27
## M07 21.35 1.096058 19.47890 4.359278e-26
## M08 22.75 1.096058 20.75621 2.100698e-27
## M03 22.00 1.096058 20.07194 1.046935e-26
## M12 21.25 1.096058 19.38766 5.444880e-26
## M13 18.40 1.096058 16.78744 4.366662e-23
## M14 23.30 1.096058 21.25800 6.639625e-28
## M09 22.20 1.096058 20.25441 6.794018e-27
## M15 22.50 1.096058 20.52812 3.571686e-27
## M06 24.35 1.096058 22.21598 7.809554e-29
## M04 26.10 1.096058 23.81261 2.589060e-30
## M01 24.90 1.096058 22.71778 2.621421e-29
## M10 27.25 1.096058 24.86183 3.054764e-31
## F10 17.15 1.096058 15.64699 1.031736e-21
## F09 20.30 1.096058 18.52092 4.686286e-25
## F06 20.00 1.096058 18.24721 9.391685e-25
## F01 20.25 1.096058 18.47530 5.259209e-25
## F05 21.80 1.096058 19.88946 1.618187e-26
## F07 21.35 1.096058 19.47890 4.359278e-26
## F02 20.60 1.096058 18.79463 2.355967e-25
## F08 22.85 1.096058 20.84744 1.701036e-27
## F03 21.20 1.096058 19.34205 6.087007e-26
## F04 23.45 1.096058 21.39486 4.867852e-28
## F11 24.35 1.096058 22.21598 7.809554e-29
## I(age - 8)
## Estimate Std. Error t value Pr(>|t|)
## M16 0.550 0.2929338 1.8775576 6.584707e-02
## M05 0.850 0.2929338 2.9016799 5.361639e-03
## M02 0.775 0.2929338 2.6456493 1.065760e-02
## M11 0.325 0.2929338 1.1094659 2.721458e-01
## M07 0.800 0.2929338 2.7309929 8.511442e-03
## M08 0.375 0.2929338 1.2801529 2.059634e-01
## M03 0.750 0.2929338 2.5603058 1.328807e-02
## M12 1.000 0.2929338 3.4137411 1.222240e-03
## M13 1.950 0.2929338 6.6567951 1.485652e-08
## M14 0.525 0.2929338 1.7922141 7.870160e-02
## M09 0.975 0.2929338 3.3283976 1.577941e-03
## M15 1.125 0.2929338 3.8404587 3.247135e-04
## M06 0.675 0.2929338 2.3042752 2.508117e-02
## M04 0.175 0.2929338 0.5974047 5.527342e-01
## M01 0.950 0.2929338 3.2430540 2.030113e-03
## M10 0.750 0.2929338 2.5603058 1.328807e-02
## F10 0.450 0.2929338 1.5361835 1.303325e-01
## F09 0.275 0.2929338 0.9387788 3.520246e-01
## F06 0.375 0.2929338 1.2801529 2.059634e-01
## F01 0.375 0.2929338 1.2801529 2.059634e-01
## F05 0.275 0.2929338 0.9387788 3.520246e-01
## F07 0.550 0.2929338 1.8775576 6.584707e-02
## F02 0.800 0.2929338 2.7309929 8.511442e-03
## F08 0.175 0.2929338 0.5974047 5.527342e-01
## F03 0.850 0.2929338 2.9016799 5.361639e-03
## F04 0.475 0.2929338 1.6215270 1.107298e-01
## F11 0.675 0.2929338 2.3042752 2.508117e-02
##
## Residual standard error: 1.31004 on 54 degrees of freedom
Let us now plot the estimated intercepts and slopes against each other.
pairs(fitslm2, id = 0.01, adj = -0.5)
The arguments id
and adj
are used to identify outliers. The subject M13 had an unusually large slope (fast skull growth). The intercepts and slopes, however, do not seem to be highly correlated. This would be different if we did not shift age by 8 years:
pairs(fitslm, id = 0.01, adj = -0.5)
This plot shows a relatively large correlation between the intercepts and slopes.
Next, we will calculate and plot confidence intervals for the subject-specific intercepts and slopes.
plot(intervals(fitslm2))
This plot tells us better how much the intercepts and slopes vary between individuals. It seems that the variability in the slopes is not as large as that in the intercepts.
Even better insight into the structure of the subject-specific coefficients is obtained if we use detrended outcomes. In our case, we may want to adjust the outcomes for Sex
. This means, we take residuals from the overall OLS as outcomes when fitting the subject-specific models.
Ort <- transform(Ort, resdistance = residuals(fitlm))
resfitslm <- lmList(resdistance ~ age | Subject, data = Ort)
resfitslm2 <- lmList(resdistance ~ I(age - 8) | Subject, data = Ort)
plot(resfitslm)
pairs(resfitslm, id = 0.01, adj = -0.5)
pairs(resfitslm2, id = 0.01, adj = -0.5)
plot(intervals(resfitslm2))
The R function for fitting linear mixed models is lme
(from package nlme
). The function accepts two arguments that specify the structure of the model: the argument fixed
for the mean structure (regression matrix \(\mathbb{X}_i\)) and the argument random
for the random effects (regression matrix \(\mathbb{Z}_i\)).
For example, the model with random intercept and random slope (in age, starting at 8 years) for Orthodont data has the formula
\[
Y_{ij}=\beta_1+(t_j-8)\beta_2+b_{i1}+(t_j-8)b_{i2}+\varepsilon_{ij},
\]
where \(Y_{ij}\) is the \(j\)-th distance
measurement on the \(i\)-th subject, \(t_j\) is the age at the \(j\)-th measurement (the same for all subjects), \(\beta_1\) is the population mean distance at 8 years, \(\beta_2\) is the mean growth in distance in one year, \(b_{i1}\) is the zero-mean random intercept for the \(i\)-th subject (the deviation of \(Y_{ij}-\varepsilon_{ij}\) from \(\beta_1\) at the age 8), \(b_{i2}\) is the zero-mean random slope for the \(i\)-th subject (the deviation of the growth rate from \(\beta_2\)), and \(\varepsilon_{ij}\) are uncorrelated random errors.
In the function lme
, this is specified as
fit.1 <- lme(fixed = distance ~ I(age - 8), random = ~ I(age - 8) | Subject, data = Ort2)
Notice that the argument random
does not specify the response in the formula, but includes the specification of groups (variable Subject
after the vertical bar). The results of the fit can be displayed by the function summary
:
summary(fit.1)
## Linear mixed-effects model fit by REML
## Data: Ort2
## AIC BIC logLik
## 454.6367 470.6173 -221.3183
##
## Random effects:
## Formula: ~I(age - 8) | Subject
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 1.886629 (Intr)
## I(age - 8) 0.226428 0.209
## Residual 1.310039
##
## Fixed effects: distance ~ I(age - 8)
## Value Std.Error DF t-value p-value
## (Intercept) 22.042593 0.4199079 80 52.49387 0
## I(age - 8) 0.660185 0.0712533 80 9.26533 0
## Correlation:
## (Intr)
## I(age - 8) -0.208
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.223105126 -0.493761342 0.007316612 0.472151020 3.916033672
##
## Number of Observations: 108
## Number of Groups: 27
We can see in two separate tables the estimates of standard deviations of the random effects, and estimates, standard errors, and Wald test statistics for the fixed effects . The first line of the output shows the estimation method for random effects. The default is REML. ML fit is obtained by specifying the argument method = "ML"
in lme()
. For example,
fixef(fit.1)
## (Intercept) I(age - 8)
## 22.0425926 0.6601852
the random effects (empirical Bayes) estimates by
ranef(fit.1)
## (Intercept) I(age - 8)
## M16 -0.7385868 -0.068853784
## M05 -0.9718653 0.025600396
## M02 -0.6114390 0.014507868
## M11 -0.0601169 -0.118826058
## M07 -0.3287723 0.034900066
## M08 0.1024078 -0.094736351
## M03 0.1129163 0.035852391
## M12 -0.0814824 0.114564208
## M13 -0.8201976 0.413669111
## M14 0.7913862 -0.014119883
## M09 0.6428730 0.135908731
## M15 1.1304470 0.208177854
## M06 1.8831736 0.083191273
## M04 2.4733754 -0.065885016
## M01 2.7770593 0.215684665
## M10 4.3424093 0.211146622
## F10 -4.2861070 -0.250590709
## F09 -2.0352814 -0.218041865
## F06 -2.1130409 -0.186557138
## F01 -1.9116365 -0.178209794
## F05 -0.8268549 -0.167957799
## F07 -0.7385868 -0.068853784
## F02 -0.9329855 0.009858033
## F08 -0.1448821 -0.174400493
## F03 -0.3676520 0.050642429
## F04 0.8302660 -0.029862247
## F11 1.8831736 0.083191273
The sum of fixed and random effects (estimated subject-specific intercepts and slopes) is calculated by
coef(fit.1)
## (Intercept) I(age - 8)
## M16 21.30401 0.5913314
## M05 21.07073 0.6857856
## M02 21.43115 0.6746931
## M11 21.98248 0.5413591
## M07 21.71382 0.6950853
## M08 22.14500 0.5654488
## M03 22.15551 0.6960376
## M12 21.96111 0.7747494
## M13 21.22240 1.0738543
## M14 22.83398 0.6460653
## M09 22.68547 0.7960939
## M15 23.17304 0.8683630
## M06 23.92577 0.7433765
## M04 24.51597 0.5943002
## M01 24.81965 0.8758699
## M10 26.38500 0.8713318
## F10 17.75649 0.4095945
## F09 20.00731 0.4421433
## F06 19.92955 0.4736280
## F01 20.13096 0.4819754
## F05 21.21574 0.4922274
## F07 21.30401 0.5913314
## F02 21.10961 0.6700432
## F08 21.89771 0.4857847
## F03 21.67494 0.7108276
## F04 22.87286 0.6303229
## F11 23.92577 0.7433765
Confidence intervals for fixed effects and variance parameters (standard deviations and correlations of random effects and error terms) are calculated by function intervals
.
intervals(fit.1)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 21.2069492 22.0425926 22.8782360
## I(age - 8) 0.5183866 0.6601852 0.8019837
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: Subject
## lower est. upper
## sd((Intercept)) 1.3043572 1.8866291 2.7288302
## sd(I(age - 8)) 0.1024815 0.2264280 0.5002817
## cor((Intercept),I(age - 8)) -0.6100575 0.2085642 0.8118229
##
## Within-group standard error:
## lower est. upper
## 1.084958 1.310039 1.581815
Model with the mean intercept and slope depending on Sex
is obtained as follows (the name of the first argument fixed
is usually omitted).
fit.2 <- lme(distance ~ I(age - 8)*Sex, random = ~ I(age - 8) | Subject, data = Ort2)
summary(fit.2)
## Linear mixed-effects model fit by REML
## Data: Ort2
## AIC BIC logLik
## 448.5817 469.7368 -216.2908
##
## Random effects:
## Formula: ~I(age - 8) | Subject
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 1.7983213 (Intr)
## I(age - 8) 0.1803446 -0.091
## Residual 1.3100400
##
## Fixed effects: distance ~ I(age - 8) * Sex
## Value Std.Error DF t-value p-value
## (Intercept) 22.615625 0.5265040 79 42.95433 0.0000
## I(age - 8) 0.784375 0.0859994 79 9.12070 0.0000
## SexFemale -1.406534 0.8248732 25 -1.70515 0.1006
## I(age - 8):SexFemale -0.304830 0.1347352 79 -2.26243 0.0264
## Correlation:
## (Intr) I(g-8) SexFml
## I(age - 8) -0.396
## SexFemale -0.638 0.253
## I(age - 8):SexFemale 0.253 -0.638 -0.396
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.168084303 -0.385940430 0.007102876 0.445155640 3.849462346
##
## Number of Observations: 108
## Number of Groups: 27
The fitted values can be obtained by calling the function fitted
. There are two types of the fitted values, the argument level
decides, which of them are calculated. The variant
fitted(fit.1, level = 0)
## M01 M01 M01 M01 M02 M02 M02 M02
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M03 M03 M03 M03 M04 M04 M04 M04
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M05 M05 M05 M05 M06 M06 M06 M06
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M07 M07 M07 M07 M08 M08 M08 M08
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M09 M09 M09 M09 M10 M10 M10 M10
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M11 M11 M11 M11 M12 M12 M12 M12
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M13 M13 M13 M13 M14 M14 M14 M14
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## M15 M15 M15 M15 M16 M16 M16 M16
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F01 F01 F01 F01 F02 F02 F02 F02
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F03 F03 F03 F03 F04 F04 F04 F04
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F05 F05 F05 F05 F06 F06 F06 F06
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F07 F07 F07 F07 F08 F08 F08 F08
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F09 F09 F09 F09 F10 F10 F10 F10
## 22.04259 23.36296 24.68333 26.00370 22.04259 23.36296 24.68333 26.00370
## F11 F11 F11 F11
## 22.04259 23.36296 24.68333 26.00370
## attr(,"label")
## [1] "Fitted values [mm]"
generates the population-level fitted values \(\widehat{\beta}_1+(t_j-8)\widehat{\beta}_2\), the variant
fitted(fit.1, level = 1)
## M01 M01 M01 M01 M02 M02 M02 M02
## 24.81965 26.57139 28.32313 30.07487 21.43115 22.78054 24.12993 25.47931
## M03 M03 M03 M03 M04 M04 M04 M04
## 22.15551 23.54758 24.93966 26.33173 24.51597 25.70457 26.89317 28.08177
## M05 M05 M05 M05 M06 M06 M06 M06
## 21.07073 22.44230 23.81387 25.18544 23.92577 25.41252 26.89927 28.38602
## M07 M07 M07 M07 M08 M08 M08 M08
## 21.71382 23.10399 24.49416 25.88433 22.14500 23.27590 24.40680 25.53769
## M09 M09 M09 M09 M10 M10 M10 M10
## 22.68547 24.27765 25.86984 27.46203 26.38500 28.12767 29.87033 31.61299
## M11 M11 M11 M11 M12 M12 M12 M12
## 21.98248 23.06519 24.14791 25.23063 21.96111 23.51061 25.06011 26.60961
## M13 M13 M13 M13 M14 M14 M14 M14
## 21.22240 23.37010 25.51781 27.66552 22.83398 24.12611 25.41824 26.71037
## M15 M15 M15 M15 M16 M16 M16 M16
## 23.17304 24.90977 26.64649 28.38322 21.30401 22.48667 23.66933 24.85199
## F01 F01 F01 F01 F02 F02 F02 F02
## 20.13096 21.09491 22.05886 23.02281 21.10961 22.44969 23.78978 25.12987
## F03 F03 F03 F03 F04 F04 F04 F04
## 21.67494 23.09660 24.51825 25.93991 22.87286 24.13350 25.39415 26.65480
## F05 F05 F05 F05 F06 F06 F06 F06
## 21.21574 22.20019 23.18465 24.16910 19.92955 20.87681 21.82406 22.77132
## F07 F07 F07 F07 F08 F08 F08 F08
## 21.30401 22.48667 23.66933 24.85199 21.89771 22.86928 23.84085 24.81242
## F09 F09 F09 F09 F10 F10 F10 F10
## 20.00731 20.89160 21.77588 22.66017 17.75649 18.57567 19.39486 20.21405
## F11 F11 F11 F11
## 23.92577 25.41252 26.89927 28.38602
## attr(,"label")
## [1] "Fitted values [mm]"
[same as fitted(fit.1)
] generates the subject-level fitted values \(\widehat{\beta}_1+(t_j-8)\widehat{\beta}_2+\widehat{b}_{i1}+(t_j-8)\widehat{b}_{i2}\).
The residuals are calculated by the function resid
, which also takes the level
argument. Standardized residuals are obtained by specifying the argument type = "pearson"
. For example, this command generates the standardized subject-level residuals:
resid(fit.1, level = 1, type = "pearson")
## M01 M01 M01 M01 M02
## 0.901001924 -1.199499401 0.516678092 0.706184057 0.052552954
## M02 M02 M02 M03 M03
## -0.214145956 -0.862512747 0.779127752 0.644630246 -0.799658385
## M03 M03 M04 M04 M04
## -0.717275490 0.891778933 0.751146791 1.370517172 -0.300119738
## M04 M05 M05 M05 M05
## -0.825753002 -0.817324439 0.807381412 -1.002923672 0.621782180
## M06 M06 M06 M06 M07
## 0.438333232 0.066777324 0.076889297 0.087001269 0.218451184
## M07 M07 M07 M08 M08
## -0.842715677 0.004456870 0.469961536 1.415987558 -1.355606487
## M08 M08 M09 M09 M09
## 0.071146168 -0.028772705 0.240095394 -2.883617929 3.916033672
## M09 M10 M10 M10 M10
## -1.116019061 0.851117962 -0.097451617 0.862318213 -0.086251366
## M11 M11 M11 M11 M12
## 0.776712695 -0.049764870 -0.494574553 -0.176048473 -0.351981898
## M12 M12 M12 M13 M13
## -0.008098210 -0.809218167 1.061337048 -3.223105126 0.862490321
## M13 M13 M14 M14 M14
## 0.368071188 1.400323581 -0.254937990 1.048739799 0.062410298
## M14 M15 M15 M15 M15
## -0.542251321 -0.132087308 -0.312788790 -0.493490272 1.234147654
## M16 M16 M16 M16 F01
## 0.531277285 -0.753159410 -0.129256696 0.112978136 0.663372276
## F01 F01 F01 F02 F02
## -0.835781594 -0.426596055 -0.017410517 -0.083666976 -0.724934997
## F02 F02 F03 F03 F03
## 0.160468509 0.282536251 -0.896874160 0.689600732 -0.013931665
## F03 F04 F04 F04 F04
## 0.045871701 0.478719473 0.279759153 -0.300869048 -0.118161486
## F05 F05 F05 F05 F06
## 0.216987553 0.610521663 -0.522615755 -0.510749527 0.053775714
## F06 F06 F06 F07 F07
## 0.094037024 -0.629037430 -0.207108239 0.149609404 0.010176354
## F07 F07 F08 F08 F08
## -0.510924578 0.112978136 0.841416994 0.099783336 -0.260182441
## F08 F09 F09 F09 F09
## -0.620148217 -0.005580887 0.082747259 0.171075404 -0.885600095
## F10 F10 F10 F10 F11
## -0.959120398 0.323902786 -0.301413439 -0.545061782 0.438333232
## F11 F11 F11
## -0.314890558 0.840225060 -0.294666612
## attr(,"label")
## [1] "Standardized residuals"
The basic plot of the standardized residuals versus fitted values, both evaluated at the innermost level of nesting, is obtained by the function plot
:
plot(fit.1)
However, the plot
function has additional arguments for lme
objects that allow considerable flexibility in creating residual plots. For example, a plot of standardized residuals versus fitted values by gender is created by the call
plot(fit.1, resid(., type = "p") ~ fitted(.) | Sex, abline = 0)
and box-plots of residuals by subject by the call
plot(fit.1, Subject ~ resid(.))
The formula argument that specifies the type of the requested plot is explained in the help to lattice
library functions dotplot, xyplot, and bwplot.
A plot of observed versus fitted values by subject can be obtained by another application of the plot
function:
plot(fit.1, distance ~ fitted(.) | Subject, abline = c(0, 1))
A scatter plot of the coefficients \(\widehat{\beta}_1+\widehat{b}_{i1}\) versus \(\widehat{\beta}_2+\widehat{b}_{i2}\) is obtained by a plain call to the pairs
function
pairs(fit.1)
This is the same as specifying pairs(fit.1, form = ~ coef(.))
. Other values of the form
argument generate alternative plots such as a scatter plot of coefficients by gender for identifying unusual subjects
pairs(fit.1, ~ coef(., augFrame = TRUE) | Sex, id = 0.1, adj = -0.5)
or a scatter plot of estimated random effects
pairs(fit.1, ~ranef(.))
or estimated random effects by Sex
pairs(fit.1, ~ranef(.) | Sex)
A plot of random effects by subject is generated by the function plot
:
plot(ranef(fit.1))
A plot of random effects by gender is another variant:
fit1RE <- ranef(fit.1, aug = TRUE)
print(fit1RE)
## (Intercept) I(age - 8) distance age Sex
## M16 -0.7385868 -0.068853784 23.000 11 Male
## M05 -0.9718653 0.025600396 23.000 11 Male
## M02 -0.6114390 0.014507868 23.375 11 Male
## M11 -0.0601169 -0.118826058 23.625 11 Male
## M07 -0.3287723 0.034900066 23.750 11 Male
## M08 0.1024078 -0.094736351 23.875 11 Male
## M03 0.1129163 0.035852391 24.250 11 Male
## M12 -0.0814824 0.114564208 24.250 11 Male
## M13 -0.8201976 0.413669111 24.250 11 Male
## M14 0.7913862 -0.014119883 24.875 11 Male
## M09 0.6428730 0.135908731 25.125 11 Male
## M15 1.1304470 0.208177854 25.875 11 Male
## M06 1.8831736 0.083191273 26.375 11 Male
## M04 2.4733754 -0.065885016 26.625 11 Male
## M01 2.7770593 0.215684665 27.750 11 Male
## M10 4.3424093 0.211146622 29.500 11 Male
## F10 -4.2861070 -0.250590709 18.500 11 Female
## F09 -2.0352814 -0.218041865 21.125 11 Female
## F06 -2.1130409 -0.186557138 21.125 11 Female
## F01 -1.9116365 -0.178209794 21.375 11 Female
## F05 -0.8268549 -0.167957799 22.625 11 Female
## F07 -0.7385868 -0.068853784 23.000 11 Female
## F02 -0.9329855 0.009858033 23.000 11 Female
## F08 -0.1448821 -0.174400493 23.375 11 Female
## F03 -0.3676520 0.050642429 23.750 11 Female
## F04 0.8302660 -0.029862247 24.875 11 Female
## F11 1.8831736 0.083191273 26.375 11 Female
plot(fit1RE, form = ~ Sex)
Histograms of estimated random effects may help in evaluating the normality assumption of the random effects.
par(mfrow = c(1, 2))
hist(fit1RE[, "(Intercept)"], xlab = expression(hat(b)[0]), prob = TRUE, main = "Intercept", col = "salmon")
hist(fit1RE[, "I(age - 8)"], xlab = expression(hat(b)[1]), prob = TRUE, main = "Slope", col = "salmon")
par(mfrow = c(1, 1))
Nevertheless, be aware of the shrinkage effect of the empirical Bayes estimates of the random effects. It causes that the empirical distribution of those estimators may look normal even if the true distribution of random effects is not. The same warning holds indeed for interpretation of the QQ plots based on the empirical Bayes estimates of the random effects.
par(mfrow = c(1, 2))
qqnorm(fit1RE[, "(Intercept)"], main = "Intercept", col = "red4", bg = "salmon", pch = 21)
qqline(fit1RE[, "(Intercept)"], col = "darkblue", lwd = 2)
qqnorm(fit1RE[, "I(age - 8)"], main = "Slope", col = "red4", bg = "salmon", pch = 21)
qqline(fit1RE[, "I(age - 8)"], col = "darkblue", lwd = 2)
par(mfrow = c(1, 1))
For the next illustration (a boxplot of random slopes by gender), we need to recode the right-hand side of fit.1
.
Ort2$age.m.8 <- Ort2$age - 8
fit.1x=lme(fixed= distance ~ age.m.8, random = ~ age.m.8 | Subject, data=Ort2)
fit1xRE <- ranef(fit.1x, aug = TRUE)
print(fit1xRE)
## (Intercept) age.m.8 distance age Sex
## M16 -0.7385868 -0.068853784 23.000 11 Male
## M05 -0.9718653 0.025600396 23.000 11 Male
## M02 -0.6114390 0.014507868 23.375 11 Male
## M11 -0.0601169 -0.118826058 23.625 11 Male
## M07 -0.3287723 0.034900066 23.750 11 Male
## M08 0.1024078 -0.094736351 23.875 11 Male
## M03 0.1129163 0.035852391 24.250 11 Male
## M12 -0.0814824 0.114564208 24.250 11 Male
## M13 -0.8201976 0.413669111 24.250 11 Male
## M14 0.7913862 -0.014119883 24.875 11 Male
## M09 0.6428730 0.135908731 25.125 11 Male
## M15 1.1304470 0.208177854 25.875 11 Male
## M06 1.8831736 0.083191273 26.375 11 Male
## M04 2.4733754 -0.065885016 26.625 11 Male
## M01 2.7770593 0.215684665 27.750 11 Male
## M10 4.3424093 0.211146622 29.500 11 Male
## F10 -4.2861070 -0.250590709 18.500 11 Female
## F09 -2.0352814 -0.218041865 21.125 11 Female
## F06 -2.1130409 -0.186557138 21.125 11 Female
## F01 -1.9116365 -0.178209794 21.375 11 Female
## F05 -0.8268549 -0.167957799 22.625 11 Female
## F07 -0.7385868 -0.068853784 23.000 11 Female
## F02 -0.9329855 0.009858033 23.000 11 Female
## F08 -0.1448821 -0.174400493 23.375 11 Female
## F03 -0.3676520 0.050642429 23.750 11 Female
## F04 0.8302660 -0.029862247 24.875 11 Female
## F11 1.8831736 0.083191273 26.375 11 Female
plot(fit1xRE, form = age.m.8 ~ Sex)
We can compare the random intercepts and slopes estimated by the LME model to individual-specific intercepts and slopes estimated by the sequence of linear models. Numerical summaries are created by function compareFits
, which has a plotting method to display them graphically:
cmp.1 <- compareFits(coef(fitslm2), coef(fit.1))
plot(cmp.1, mark = fixef(fit.1))
Notice that the purple dots (coefficients from the LME model) are all shifted towards the fixed effect estimates shown by vertical dashed lines (shrinkage estimators).
We can also compare graphical representations of the fitted lines by the command comparePred
.
cmp.2 <- comparePred(fitslm2, fit.1, length.out = 2)
## Error in sprintf(gettext(fmt, domain = domain), ...): invalid type of argument[1]: 'symbol'
plot(cmp.2, layout = c(8, 4), between = list(y = c(0, 0.5)))
## Error in plot(cmp.2, layout = c(8, 4), between = list(y = c(0, 0.5))): object 'cmp.2' not found
By default, the variance matrix of random effects \(D\) is any positive definite matrix. The user can make the random effects uncorrelated (requiring \(\mathsf{cor}(b_{i1},b_{i2})=0\)) by enclosing the random effect formula in the pdDiag
function. This requests a diagonal variance matrix of the random effects. There are several other built-in functions for specification of the structure of this matrix, their names all start with the prefix pd
(see help to pdClasses
).
fit.1.uncor <- lme(fixed = distance ~ age.m.8, random = pdDiag(~ age.m.8), data = Ort2)
summary(fit.1.uncor)
## Linear mixed-effects model fit by REML
## Data: Ort2
## AIC BIC logLik
## 452.8916 466.2088 -221.4458
##
## Random effects:
## Formula: ~age.m.8 | Subject
## Structure: Diagonal
## (Intercept) age.m.8 Residual
## StdDev: 1.962957 0.2469767 1.293767
##
## Fixed effects: distance ~ age.m.8
## Value Std.Error DF t-value p-value
## (Intercept) 22.042593 0.4314009 80 51.09538 0
## age.m.8 0.660185 0.0732042 80 9.01841 0
## Correlation:
## (Intr)
## age.m.8 -0.294
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.98617508 -0.49418192 0.01618754 0.42884417 3.96519081
##
## Number of Observations: 108
## Number of Groups: 27
Notice that the grouping factor | Subject
cannot be specified in the random
term when the pdDiag
function is used. The grouping is extracted from the groupedData
object Ort2
.