NMSA407 Linear Regression: Tutorial

Matrix algebra background of linear regression




Dataset

Everything in this tutorial will be exemplified on pseudodata that we now create.

y <- c(-9, -11, 1, 19)
x <- c(-3, -1, 1, 3)
x2 <- c(9, 1, 1, 9)
x2 <- x^2                     ### the same
X <- cbind(1, x, x2)
colnames(X) <- c("intcpt", "x", "x^2")
print(X)
##      intcpt  x x^2
## [1,]      1 -3   9
## [2,]      1 -1   1
## [3,]      1  1   1
## [4,]      1  3   9
Data <- data.frame(y = y, x = x, x2 = x2)
print(Data)
##     y  x x2
## 1  -9 -3  9
## 2 -11 -1  1
## 3   1  1  1
## 4  19  3  9




Linear model

We will be fitting the following linear model: \[ \mathsf{E}Y = \beta_0 + \beta_1\,x + \beta_2\,x^2. \]




Raw solution to normal equations

First, we calculate solution to normal equations directly from theoretically derived expressions, i.e., as \(\widehat{\beta} = \bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}\mathbf{X}^\top\mathbf{Y}\).

Matrix \(\mathbf{X}^\top\mathbf{X}\)

XtX <- t(X) %*% X
XtX <- crossprod(X)    ## the same
print(XtX)
##        intcpt  x x^2
## intcpt      4  0  20
## x           0 20   0
## x^2        20  0 164

Matrix \(\bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}\)

invXtX <- solve(XtX)
print(invXtX)
##           intcpt    x       x^2
## intcpt  0.640625 0.00 -0.078125
## x       0.000000 0.05  0.000000
## x^2    -0.078125 0.00  0.015625

Solution to normal equations using already calculated inverse

b <- invXtX %*% t(X) %*% y
print(b)
##         [,1]
## intcpt -6.25
## x       4.80
## x^2     1.25

Solution to normal equations calculated directly as a solution to a linear system

solve(XtX, t(X) %*% y)
##         [,1]
## intcpt -6.25
## x       4.80
## x^2     1.25
solve(crossprod(X), crossprod(X, y))   ## the same
##         [,1]
## intcpt -6.25
## x       4.80
## x^2     1.25




Linear algebra and solution to normal equations

Calculation of the solution to normal equations using the above “raw” approach is numerically not too optimal. In practice, numerically more stable and much better methods are used which are based on decompositions of the model matrix \(\mathbf{X}\) known from linear algebra. The advantage of these methods is also the fact that once the decomposition of the model matrix \(\mathbf{X}\) is available, calculation of all the rest (fitted values, residuals, solution to normal equations, …) is trivial (and also computationally fast). This is in particular useful in situations when the least squares solutions are needed for a series of “response vectors” \(\mathbf{Y}\) while keeping the model matrix \(\mathbf{X}\) fixed.

The most often used decompositions are

In the following, the above two decompositions will be explored in more detail. Nevertheless first, we explore other two decompositions of a symmetric positive semidefinite matrix \(\mathbf{X}^\top\mathbf{X}\) that can also be used to solve the normal equations \(\mathbf{X}^\top\mathbf{X}\,\mathbf{b}= \mathbf{X}^\top\mathbf{Y}\) and possibly for some other tasks.

These will be

Both the spectral decomposition and the Cholesky decomposition should be known from the bachelor courses Linear algebra and Fundamentals of Numerical Mathematics, Cholesky decomposition perhaps in a more general form of the LU decomposition.




Spectral decomposition of a symmetric positive semidefinite matrix \(\mathbf{X}^\top\mathbf{X}\)

\[ \mathbf{X}^\top\mathbf{X} = \mathbf{V}\,\Lambda\,\mathbf{V}^\top = \sum_{j=1}^p \lambda_j\,\mathbf{v}_j\,\mathbf{v}_j^\top, \qquad \lambda_j \geq 0,\;\;j=1,\ldots,p. \]

It is additionally satisfied: \[ \mathbf{V}\,\mathbf{V}^\top = \sum_{j=1}^p \mathbf{v}_j\,\mathbf{v}_j^\top = \mathbf{I}_p, \] \[ \mathbf{V}^\top\,\mathbf{V} = \bigl(\mathbf{v}_j^\top\,\mathbf{v}_l\bigr) = \mathbf{I}_p. \] That is, \[ \mathbf{V}^{-1} = \mathbf{V}^\top. \] If \(\lambda_j > 0\) for all \(j=1,\ldots,p\) then \[ \bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1} = \sum_{j=1}^p \lambda_j^{-1}\,\mathbf{v}_j\,\mathbf{v}_j^\top = \mathbf{V}\,\Lambda^{-1}\,\mathbf{V}^\top. \]

Spectral decomposition of \(\mathbf{X}^\top\mathbf{X}\)

eigen(XtX)
## $values
## [1] 166.462113  20.000000   1.537887
## 
## $vectors
##            [,1] [,2]       [,3]
## [1,] -0.1221833    0  0.9925076
## [2,]  0.0000000   -1  0.0000000
## [3,] -0.9925076    0 -0.1221833

Elements of an object returned by the eigen function

e <- eigen(XtX)
names(e)
## [1] "values"  "vectors"

Does it satisfy what it should satisfy?

V <- e$vectors
Lambda <- diag(e$values)
V %*% Lambda %*% t(V)
##      [,1] [,2] [,3]
## [1,]    4    0   20
## [2,]    0   20    0
## [3,]   20    0  164
round(V %*% t(V), 13)             ## only numerically identity matrix
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
round(t(V) %*% V, 13)             ## only numerically identity matrix
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
round(XtX %*% V - V %*% Lambda, 13)    ## X'X * V = V * Lambda
##        [,1] [,2] [,3]
## intcpt    0    0    0
## x         0    0    0
## x^2       0    0    0
invLambda <- diag(1 / e$values)
V %*% invLambda %*% t(V)
##           [,1] [,2]      [,3]
## [1,]  0.640625 0.00 -0.078125
## [2,]  0.000000 0.05  0.000000
## [3,] -0.078125 0.00  0.015625
round(V %*% invLambda %*% t(V) - invXtX, 13)  ## V * Lambda^(-1) * V' = (X'X)^(-1)
##        intcpt x x^2
## intcpt      0 0   0
## x           0 0   0
## x^2         0 0   0




Cholesky (square root) decomposition of a symmetric positive semidefinite matrix \(\mathbf{X}^\top\mathbf{X}\)

\[ \mathbf{X}^\top\mathbf{X} = \mathbf{L}\,\mathbf{L}^\top, \] where \(\mathbf{L}\) is a lower triangular matrix.

This is a special case of so called LU decomposition where in the case of a symmetric positive semidefinite matrix \(\mathbf{U} = \mathbf{L}^\top\) (or \(\mathbf{L} = \mathbf{U}^\top\)).

In statistics, symmetric positive semidefinite matrices are quite often faced with as all covariance matrices are of this type. The Cholesky decomposition is in fact a generalization of a square root to matrices (as will immediately be shown). The Cholesky decomposition of a covariance matrix is then in fact a multivariate version of a standard deviation. Its calculation then simplifies many problems.

If \(\mathbf{D}\) is a symmetric positive semidefinite matrix (e.g., a covariance matrix or a matrix \(\mathbf{X}^\top\mathbf{X}\) in a regression context) then it can be decomposed as \(\mathbf{D} = \mathbf{L}\,\mathbf{L}^\top\), where \(\mathbf{L}\) is lower triangular matrix. Among other things, the following tasks are practically trivial when \(\mathbf{L}\) is already available.

Cholesky decomposition of a matrix \(\mathbf{X}^\top\mathbf{X}\)

R returns matrix \(\mathbf{U} = \mathbf{L}^\top\).

chol(XtX)
##        intcpt        x x^2
## intcpt      2 0.000000  10
## x           0 4.472136   0
## x^2         0 0.000000   8

Does it satisfy what it should satisfy?

U <- chol(XtX)
t(U) %*% U
##        intcpt  x x^2
## intcpt      4  0  20
## x           0 20   0
## x^2        20  0 164

Calculate \(\bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}\) from the Cholesky decomposition

chol2inv(U)
##           [,1] [,2]      [,3]
## [1,]  0.640625 0.00 -0.078125
## [2,]  0.000000 0.05  0.000000
## [3,] -0.078125 0.00  0.015625
invXtX2 <- chol2inv(U)
round(invXtX2 - invXtX, 13)      ## compare with previously calculated (X'X)^(-1), numerically the same
##        intcpt x x^2
## intcpt      0 0   0
## x           0 0   0
## x^2         0 0   0

Determinant of \(\mathbf{X}^\top\mathbf{X}\)

prod(diag(U))^2
## [1] 5120
det(XtX)                         ## R function to calculate determinant
## [1] 5120

Solution to normal equations via backward and forward substitution using the Cholesky decomposition of a matrix \(\mathbf{X}^\top\mathbf{X}\)

We need to solve \(\mathbf{X}^\top\mathbf{X}\,\mathbf{b} = \mathbf{X}^\top\mathbf{Y}\). Since \(\mathbf{X}^\top\mathbf{X} = \mathbf{U}^\top\mathbf{U}\), we have to solve \(\mathbf{U}^\top\mathbf{U}\,\mathbf{b} = \mathbf{X}^\top\mathbf{Y}\).

(v <- forwardsolve(t(U), crossprod(X, y)))     ## solution to U' * v = X'Y
##          [,1]
## [1,]  0.00000
## [2,] 21.46625
## [3,] 10.00000
(bchol <- backsolve(U, v))                     ## solution to U * b  = v
##       [,1]
## [1,] -6.25
## [2,]  4.80
## [3,]  1.25
print(b)      ## compare to previously calculated solution
##         [,1]
## intcpt -6.25
## x       4.80
## x^2     1.25




Singular value decomposition (SVD) of the model matrix \(\mathbf{X}\)

Suppose that \(n = \mathrm{nrow}(\mathbf{X})\), \(k = \mathrm{ncol}(\mathbf{X})\), \(r = \mathrm{rank}(\mathbf{X}) \leq k \leq n\). The singular value decomposition (SVD) of the model matrix \(\mathbf{X}\) takes the following form: \[ \mathbf{X} = \mathbf{U}\,\mathbf{D}\,\mathbf{V}^\top, \] where

Numbers \(d_1,\ldots,d_k\) are called singular values of the matrix \(\mathbf{X}\). It holds:

Further,

If for all \(j=1,\ldots,k\), \(d_j > 0\), matrix \(\mathbf{X}^\top\,\mathbf{X}\) is invertible and \[ \bigl(\mathbf{X}^\top\,\mathbf{X}\bigr)^{-1} = \mathbf{U}\,\mathbf{D}^{-2}\,\mathbf{V}^\top = \sum_{j=1}^k \frac{1}{d_j^2}\,\mathbf{u}_j\,\mathbf{v}_j^\top, \] where \(\mathbf{D}^{-2}\) is a diagonal matrix with \(d_1^{-2},\ldots,d_k^{-2}\) on a diagonal.

Since matrix \(\mathbf{U}\) contains an orthonormal basis of \(M(\mathbf{X})\) in its columns, the projection matrix into \(M(\mathbf{X})\) is \[ \mathbf{H} = \mathbf{U}\,\mathbf{U}^\top. \]

The SVD decomposition can also be used to calculate the Moore-Penrose pseudoinverse matrix to \(\mathbf{X}\): \[ \mathbf{X}^- = \tilde{\mathbf{V}}\,{\tilde{\mathbf{D}}}^{-1}\,{\tilde{\mathbf{U}}}^\top, \] where matrices \(\tilde{\mathbf{V}}_{k\times r},\,\tilde{\mathbf{D}}_{r\times r},\,\tilde{\mathbf{U}}_{n\times r}\) contains/corresponds only to positive singular values.

Singular value decomposition of a matrix \(\mathbf{X}\)

svd(X)
## $d
## [1] 12.902020  4.472136  1.240116
## 
## $u
##            [,1]       [,2]        [,3]
## [1,] 0.70180882  0.6708204 -0.08639661
## [2,] 0.08639661  0.2236068  0.70180882
## [3,] 0.08639661 -0.2236068  0.70180882
## [4,] 0.70180882 -0.6708204 -0.08639661
## 
## $v
##           [,1] [,2]       [,3]
## [1,] 0.1221833    0  0.9925076
## [2,] 0.0000000   -1  0.0000000
## [3,] 0.9925076    0 -0.1221833

Content of an object returned by the svd function

svdX <- svd(X)
names(svdX)
## [1] "d" "u" "v"
(U <- svdX$u)
##            [,1]       [,2]        [,3]
## [1,] 0.70180882  0.6708204 -0.08639661
## [2,] 0.08639661  0.2236068  0.70180882
## [3,] 0.08639661 -0.2236068  0.70180882
## [4,] 0.70180882 -0.6708204 -0.08639661
(D <- diag(svdX$d))
##          [,1]     [,2]     [,3]
## [1,] 12.90202 0.000000 0.000000
## [2,]  0.00000 4.472136 0.000000
## [3,]  0.00000 0.000000 1.240116
(V <- svdX$v)
##           [,1] [,2]       [,3]
## [1,] 0.1221833    0  0.9925076
## [2,] 0.0000000   -1  0.0000000
## [3,] 0.9925076    0 -0.1221833

Matrices \(\mathbf{D}^2,\,\mathbf{D}^{-1},\,\mathbf{D}^{-2}\)

(D2  <- diag(svdX$d^2))       ## = D * D 
##          [,1] [,2]     [,3]
## [1,] 166.4621    0 0.000000
## [2,]   0.0000   20 0.000000
## [3,]   0.0000    0 1.537887
(iD  <- diag(1 / svdX$d))     ## = D^(-1)
##            [,1]      [,2]      [,3]
## [1,] 0.07750724 0.0000000 0.0000000
## [2,] 0.00000000 0.2236068 0.0000000
## [3,] 0.00000000 0.0000000 0.8063762
(iD2 <- diag(1 / svdX$d^2))   ## = (D * D)^(-1)
##             [,1] [,2]      [,3]
## [1,] 0.006007373 0.00 0.0000000
## [2,] 0.000000000 0.05 0.0000000
## [3,] 0.000000000 0.00 0.6502426

Do the above matrices satisfy what they should satisfy?

U %*% D %*% t(V)              ### --> X
##      [,1] [,2] [,3]
## [1,]    1   -3    9
## [2,]    1   -1    1
## [3,]    1    1    1
## [4,]    1    3    9
V %*% D %*% t(D) %*% t(V)     ### --> X'X
##      [,1] [,2] [,3]
## [1,]    4    0   20
## [2,]    0   20    0
## [3,]   20    0  164
round(t(U) %*% U, 13)         ### --> I (numerically)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
round(t(V) %*% V, 13)         ### --> I (numerically)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
round(V %*% V, 13)            ### --> I (numerically)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

Moore-Penrose pseudoinverse of a matrix \(\mathbf{X}\)

Xmin <- V %*% diag(1/svdX$d) %*% t(U)
print(Xmin)
##         [,1]    [,2]    [,3]    [,4]
## [1,] -0.0625  0.5625  0.5625 -0.0625
## [2,] -0.1500 -0.0500  0.0500  0.1500
## [3,]  0.0625 -0.0625 -0.0625  0.0625

Properties of the Moore-Penrose pseudoinverse

round(X %*% Xmin %*% X - X, 13)                    ## X * X^- * X = X
##      intcpt x x^2
## [1,]      0 0   0
## [2,]      0 0   0
## [3,]      0 0   0
## [4,]      0 0   0
round(Xmin %*% X %*% Xmin - Xmin, 13)              ## X^- * X * X^- = X^-
##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
round(t(X %*% Xmin) - X %*% Xmin, 13)              ## (X * X^-)' = X * X^-
##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0
round(t(Xmin %*% X) - Xmin %*% X, 13)              ## (X^- * X)' = X^- * X
##        [,1] [,2] [,3]
## intcpt    0    0    0
## x         0    0    0
## x^2       0    0    0

\(\mathbf{H}\) matrix to project into \(M(\mathbf{X})\) (regression space)

(H <- U %*% t(U))
##       [,1]  [,2]  [,3]  [,4]
## [1,]  0.95  0.15 -0.15  0.05
## [2,]  0.15  0.55  0.45 -0.15
## [3,] -0.15  0.45  0.55  0.15
## [4,]  0.05 -0.15  0.15  0.95

\(\mathbf{M}\) matrix to project into the orthogonal complement of \(M(\mathbf{X})\) (residual space)

(M <- diag(nrow(X)) - H)
##       [,1]  [,2]  [,3]  [,4]
## [1,]  0.05 -0.15  0.15 -0.05
## [2,] -0.15  0.45 -0.45  0.15
## [3,]  0.15 -0.45  0.45 -0.15
## [4,] -0.05  0.15 -0.15  0.05

Fitted values and residuals

(yhat <- H %*% y)
##      [,1]
## [1,] -9.4
## [2,] -9.8
## [3,] -0.2
## [4,] 19.4
(u <- M %*% y)
##      [,1]
## [1,]  0.4
## [2,] -1.2
## [3,]  1.2
## [4,] -0.4

Fitted values and residuals using the solution to normal equations (to compare)

(yhat_from_b <- X %*% b)
##      [,1]
## [1,] -9.4
## [2,] -9.8
## [3,] -0.2
## [4,] 19.4
(u_from_b <- y - yhat_from_b)
##      [,1]
## [1,]  0.4
## [2,] -1.2
## [3,]  1.2
## [4,] -0.4

Matrix \(\mathbf{X}^\top\mathbf{X}\)

V %*% D2 %*% t(V)         ## X'X
##      [,1] [,2] [,3]
## [1,]    4    0   20
## [2,]    0   20    0
## [3,]   20    0  164
crossprod(D %*% t(V))     ## X'X calculated even in a better way
##      [,1] [,2] [,3]
## [1,]    4    0   20
## [2,]    0   20    0
## [3,]   20    0  164
print(XtX)                ## previously calculated X'X
##        intcpt  x x^2
## intcpt      4  0  20
## x           0 20   0
## x^2        20  0 164

Matrix \(\bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}\)

V %*% iD2 %*% t(V)        ## (X'X)^(-1)
##           [,1] [,2]      [,3]
## [1,]  0.640625 0.00 -0.078125
## [2,]  0.000000 0.05  0.000000
## [3,] -0.078125 0.00  0.015625
crossprod(iD %*% t(V))    ## (X'X)^(-1) calculated even in a better way
##           [,1] [,2]      [,3]
## [1,]  0.640625 0.00 -0.078125
## [2,]  0.000000 0.05  0.000000
## [3,] -0.078125 0.00  0.015625
print(invXtX)             ## previously calculated X'X
##           intcpt    x       x^2
## intcpt  0.640625 0.00 -0.078125
## x       0.000000 0.05  0.000000
## x^2    -0.078125 0.00  0.015625




QR decomposition of the model matrix \(\mathbf{X}\)

Still suppose that \(n = \mathrm{nrow}(\mathbf{X})\), \(k = \mathrm{ncol}(\mathbf{X})\), \(r = \mathrm{rank}(\mathbf{X}) \leq k \leq n\). The QR decomposition of the model matrix \(\mathbf{X}\) takes the following form: \[ \mathbf{X} = \mathbf{Q}\,\mathbf{R}, \] where

Further, \[ \mathbf{X}^\top\mathbf{X} = \mathbf{R}^\top\,\mathbf{Q}^\top\mathbf{Q}\,\mathbf{R} = \mathbf{R}^\top\mathbf{R}, \] since the columns of the matrix \(\mathbf{Q}\) form an orthonormal basis and hence \(\mathbf{Q}^\top\mathbf{Q} = \mathbf{I}_k\).

System of normal equations is then easily solvable by one backward substition since: \[ \mathbf{X}^\top\mathbf{X}\,\mathbf{b} = \mathbf{X}^\top\mathbf{Y} \] \[ \mathbf{R}^\top\mathbf{R}\,\mathbf{b} = \mathbf{R}^\top\mathbf{Q}^\top\,\mathbf{Y} \] \[ \mathbf{R}\,\mathbf{b} = \mathbf{Q}^\top\,\mathbf{Y}. \] That is, it is only requested to solve one linear system with an upper triangular matrix.

If \(r = k\) then \(\mathbf{X}^\top\mathbf{X}\) is invertible and \[ \bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1} = \bigl(\mathbf{R}^\top\mathbf{R}\bigr)^{-1} = \mathbf{R}^{-1}\bigl(\mathbf{R}^{-1}\bigr)^\top. \] That is, it is only needed to invert a triangular matrix (which is easy).

QR decomposition of a matrix \(\mathbf{X}\)

qr(X)
## $qr
##      intcpt          x         x^2
## [1,]   -2.0  0.0000000 -10.0000000
## [2,]    0.5 -4.4721360   0.0000000
## [3,]    0.5  0.4472136   8.0000000
## [4,]    0.5  0.8944272  -0.9296181
## 
## $rank
## [1] 3
## 
## $qraux
## [1] 1.500000 1.000000 1.368524
## 
## $pivot
## [1] 1 2 3
## 
## attr(,"class")
## [1] "qr"

Note that results are stored and shown in a compact form.

Content of an object returned by the qr function

qrX <- qr(X)
names(qrX)
## [1] "qr"    "rank"  "qraux" "pivot"

Rank of the model matrix

qrX$rank
## [1] 3
r <- qrX$rank
n <- nrow(X)

\(\mathbf{Q}\) matrix

In our case, \(r = k = 3\), all columns of \(\mathbf{Q}\) form an orthonormal basis of \(M(\mathbf{X})\).

qr.Q(qrX)
##      [,1]       [,2] [,3]
## [1,] -0.5  0.6708204  0.5
## [2,] -0.5  0.2236068 -0.5
## [3,] -0.5 -0.2236068 -0.5
## [4,] -0.5 -0.6708204  0.5
Q <- qr.Q(qrX)

\(\mathbf{R}\) matrix

qr.R(qrX)
##      intcpt         x x^2
## [1,]     -2  0.000000 -10
## [2,]      0 -4.472136   0
## [3,]      0  0.000000   8
R <- qr.R(qrX)

Is it really \(\mathbf{X} = \mathbf{Q}\,\mathbf{R}\)?

Q %*% R - X                ## not really zeros
##      intcpt x          x^2
## [1,]      0 0 0.000000e+00
## [2,]      0 0 0.000000e+00
## [3,]      0 0 8.881784e-16
## [4,]      0 0 0.000000e+00
round(Q %*% R - X, 14)     ## numerically yes
##      intcpt x x^2
## [1,]      0 0   0
## [2,]      0 0   0
## [3,]      0 0   0
## [4,]      0 0   0

\(\mathbf{Q}\) matrix complemented by \(n - r\) columns to form an orthonormal basis of \(R^n\) (in our case \(R^4\))

The orthonormal basis of \(R^n\) will be stored in a matrix \(\mathbf{P}\).

(P <- qr.Q(qrX, complete = TRUE))
##      [,1]       [,2] [,3]       [,4]
## [1,] -0.5  0.6708204  0.5  0.2236068
## [2,] -0.5  0.2236068 -0.5 -0.6708204
## [3,] -0.5 -0.2236068 -0.5  0.6708204
## [4,] -0.5 -0.6708204  0.5 -0.2236068

Orthonormal basis of the residual space (orthogonal complement to \(M(\mathbf{X})\))

The last \(n-r\) columns of the matrix \(\mathbf{P}\) form an orthonormal basis of the residual space. We store this basis in a matrix \(\mathbf{N}\).

(N <- P[, (r + 1):n])
## [1]  0.2236068 -0.6708204  0.6708204 -0.2236068

Projection matrix \(\mathbf{H}\) into the regression space \(M(\mathbf{X})\)

(H <- Q %*% t(Q))
##       [,1]  [,2]  [,3]  [,4]
## [1,]  0.95  0.15 -0.15  0.05
## [2,]  0.15  0.55  0.45 -0.15
## [3,] -0.15  0.45  0.55  0.15
## [4,]  0.05 -0.15  0.15  0.95

Projection matrix \(\mathbf{M}\) into the residual space (orthogonal complement to \(M(\mathbf{X})\))

(M <- N %*% t(N))
##       [,1]  [,2]  [,3]  [,4]
## [1,]  0.05 -0.15  0.15 -0.05
## [2,] -0.15  0.45 -0.45  0.15
## [3,]  0.15 -0.45  0.45 -0.15
## [4,] -0.05  0.15 -0.15  0.05

Fitted values and residuals

(yhat <- H %*% y)
##      [,1]
## [1,] -9.4
## [2,] -9.8
## [3,] -0.2
## [4,] 19.4
(u <- M %*% y)
##      [,1]
## [1,]  0.4
## [2,] -1.2
## [3,]  1.2
## [4,] -0.4

Fitted values and residuals calculated using the solution to normal equations

(yhat_from_b <- X %*% b)
##      [,1]
## [1,] -9.4
## [2,] -9.8
## [3,] -0.2
## [4,] 19.4
(u_from_b <- y - yhat_from_b)
##      [,1]
## [1,]  0.4
## [2,] -1.2
## [3,]  1.2
## [4,] -0.4

Matrix \(\mathbf{X}^\top\mathbf{X}\) easily calculated from the \(\mathbf{R}\) matrix

crossprod(R)      ## = R'R = X'X
##        intcpt  x x^2
## intcpt      4  0  20
## x           0 20   0
## x^2        20  0 164
print(XtX)        ## previously calculated X'X
##        intcpt  x x^2
## intcpt      4  0  20
## x           0 20   0
## x^2        20  0 164

Matrix \(\bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}\) calculated from its Cholesky decomposition \(\mathbf{R}^\top\mathbf{R}\)

chol2inv(R)       ## inverse of X'X calculated from its Cholesky factor R (programme R uses the information that R matrix is upper triangular)
##           [,1] [,2]      [,3]
## [1,]  0.640625 0.00 -0.078125
## [2,]  0.000000 0.05  0.000000
## [3,] -0.078125 0.00  0.015625
print(invXtX)     ## compare to previously calculated (X'X)^(-1)
##           intcpt    x       x^2
## intcpt  0.640625 0.00 -0.078125
## x       0.000000 0.05  0.000000
## x^2    -0.078125 0.00  0.015625

Solution to normal equations by solving \(\mathbf{R}\,\mathbf{b} = \mathbf{Q}^\top\mathbf{Y}\)

(b_from_QR <- backsolve(R, crossprod(Q, y)))
##       [,1]
## [1,] -6.25
## [2,]  4.80
## [3,]  1.25
print(b)          ## previously calculated b for comparison
##         [,1]
## intcpt -6.25
## x       4.80
## x^2     1.25

Residual sum of squares

The residual sum of squares is also easily calculated, using the \(\mathbf{N}\) matrix (orthonormal basis of the residual space) since \[ \mathrm{SS}_e = \bigl(\mathbf{M}\mathbf{Y}\bigr)^\top\,\mathbf{M}\mathbf{Y} = \mathbf{Y}^\top\,\mathbf{M}^\top\mathbf{M}\,\mathbf{Y} = \mathbf{Y}^\top\,\mathbf{N}\mathbf{N}^\top\mathbf{N}\mathbf{N}^\top\,\mathbf{Y} = \mathbf{Y}^\top\,\mathbf{N}\mathbf{N}^\top\,\mathbf{Y} = \bigl\|\mathbf{N}^\top\,\mathbf{Y}\bigr\|^2. \]

(NtY <- crossprod(N, y))
##          [,1]
## [1,] 1.788854
(SSe <- as.numeric(crossprod(NtY)))
## [1] 3.2




Conclusions

It should now be clear that when dealing with the least squares, it is not necessary to calculate the matrix \(\mathbf{X}^\top\mathbf{X}\), or even its inverse directly from \(\mathbf{X}^\top\mathbf{X}\). Everything can be obtained from a suitable decomposition of the model matrix \(\mathbf{X}\), and once this decomposition is available, all the rest is easy (also with respect to computational demands).