NMST539 | Lab Session 4

Wishart, Hotelling, and Wilk Distribution

Summer Term 2021/2022 | 08/03/22

source file (UTF8 encoding)

Outline for the fourth NMST539 lab session

random matrices;
multivariate distributions: Wishart, Hotelling, and Wilk
multivariate tests for the mean vector (multivariate analogies of standard \(t\) tests);

The R-software is available for download from the website: https://www.r-project.org

A user-friendly interface (one of many): RStudio.

Manuals and introduction into R (in Czech or English):

Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
Komárek, A.: Základy práce s R. (PDF súbor)
Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

Theoretical/statistical references

Hardle, W. and Simar.L.: Applied Multivariate Statistical Analysis. Springer, 2015
Mardia, K., Kent, J., and Bibby, J.:Multivariate Analysis, Academic Press, 1979.

1. The Wishart distribution

The Wishart distribution is a multivariate generalization of the univariate \(\chi^2\) distribution used for the variance inference in the univariate case. It is named in honor of John Wishart, who formulated this distribution in 1928. The Wishart distribution is used as an eficient tool for the analysis of the variance-covariance matrix of same random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n}\), for some \(n \in \mathbb{N}\), where \(\boldsymbol{X}_{i} = (X_{i 1}, \dots, X_{i p})^\top\) is a \(p\)-dimensional random vector (for instance, \(p\) different covariates/characteristics recorded on each subject).

The Wishart distribution is a multi-variate generalization of the \(\chi^2\) distribution in the following sence: for a normally distributed random sample (random vetors) \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n}\) drawn from some multivariate distribution \(N_{p}(\boldsymbol{0}, \Sigma)\), with the zero mean vector and \(\Sigma\) being some variance-covariance matrix (symmetric and positive definite), we have that \[ \mathbb{X}^{\top}\mathbb{X} \sim W_{p}(\Sigma, n), \] for \(\mathbb{X} = (\boldsymbol{X}_{1}^\top, \dots, \boldsymbol{X}_{n}^\top)^{\top}\) being the ‘’data matrix’’. Similarly, for a univariate random sample (univariate random variables) \(X_{1}, \dots, X_{n}\) drawn, for instance, from \(N(0,1)\), we have that \[ \boldsymbol{X}^{\top}\boldsymbol{X} \sim \chi_{n}^2, \] where \(\boldsymbol{X} = (X_{1}, \dots, X_{n})^\top\). Thus, for a sample of size \(n \in \mathbb{N}\) drawn from \(N(0,1)\) the corresponding distribution of \(\mathbb{X}^\top\mathbb{X}\) is \(W_{1}(1, n) \equiv \chi_{n}^2\). Thus, the Wishart distribution is a family of probability distributions defined over symmetric, non-negative-definite matrix-valued random variables (so called random matrices).

The corresponding density function of the Wishart distribution takes the form \[ f(\mathcal{X}) = \frac{1}{2^{np/2} |\Sigma|^{n/2} \Gamma_{p}(\frac{n}{2})} \cdot |\mathcal{X}|^{\frac{n - p - 1}{2}} e^{-(1/2) tr(\Sigma^{-1}\mathcal{X})}, \] for \(\mathcal{X}\) being a \(p\times p\) random matrix and \(\Gamma_{p}(\cdot)\) is a multivariate generalization of the Gamma function \(\Gamma(\cdot)\). There are various options (packages, libraries, and commands) for using/utilizing the Wishart distribution in the R software.

Individual tasks (theoretical and practical)

Consider a multivariate random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n}\) which comes from some general multivariate normal distribution \(N_{p}(\boldsymbol{\mu}, \Sigma)\), with some mean vector \(\boldsymbol{\mu} \in \mathbb{R}^p\) and the variance-covariance positive definite matrix \(\Sigma\). Show, that the random matrix defined as \(n\mathcal{S} = \mathbb{X}^\top \mathcal{H}\mathbb{X}\) follows the Wishart distribution \(W_{p}(\Sigma, n - 1)\), for \(\mathcal{H}\) being the idenpotent centering matrix \(\mathcal{H} = \mathbb{I}_{n} - \frac{1}{n}\boldsymbol{1}_{n}\boldsymbol{1}_{n}^\top\).

use the help session in R to find out more about some standard commands used in R for the Wishart distribution (in particular, check the help session for the command rWishart());

some other commands and additional extensions are available in some packages which need to be downloaded and installed separately, e.g. function Wishart() in the library ‘MCMCpack’, or Wishart() in the library ‘mixAK’, or rwishart() in the library ‘dlm’, dist.Wishart() in the library ‘LaplacesDemon’, and others;

A simple random sample from the univariate Wishart distribution \(W_{1}(\Sigma = 1, n = 10)\) (equivalently a \(\chi^2\) distribution with \(n = 10\) degrees of freedom) can be obtained, for instance, as

S <- as.matrix(1)
sample <- rWishart(10, df = 10, S)

For a general \(p \in \mathbb{N}\) the same command can be used, however, with the corresponding variance-covariance matrix \(\Sigma\) to be properly specified as a symetric, positive nefinite matrix, for instance:

Sigma <- cbind(c(1,0,0,0), c(0,1,0,0), c(0,0,1,0), c(0,0,0,1))
rWishart(2, df = 100, Sigma) ### sample of size two

## , , 1
## 
##           [,1]       [,2]       [,3]       [,4]
## [1,] 90.254198  -3.018348   1.361425  -3.778287
## [2,] -3.018348 104.745611 -11.729479 -15.193431
## [3,]  1.361425 -11.729479  90.766162 -13.213839
## [4,] -3.778287 -15.193431 -13.213839 111.550816
## 
## , , 2
## 
##             [,1]       [,2]      [,3]       [,4]
## [1,]  71.1019475  0.8052563  1.079318 -12.743513
## [2,]   0.8052563 95.6501604 -2.338480   7.394609
## [3,]   1.0793177 -2.3384802 99.048605   4.075932
## [4,] -12.7435128  7.3946089  4.075932 124.372176

We can also use the standard approach for generating a sample from the \(\chi^2\) distribution – the R function rchisq() – and, using some histograms, we can empirically compare the univariate Wishart distribution and the \(\chi^2\) distribution. For large sample size \(n \in \mathbb{N}\) we they should be quite similar/identical.

set.seed(1234)
sampleWishart <- rWishart(5000, df = 10, S)
sampleChiSq <- rchisq(5000, df = 10)

par(mfrow = c(1,2))
hist(sampleWishart, col = "lightblue", main = expression(paste("Wishart Distribution", sep ="")), xlim = c(0, 40), breaks = 30, freq = F)
lines(density(sampleWishart), col = "red", lwd = 2)
hist(sampleChiSq, col = "lightblue", main = expression(paste(chi^2, "Distribution", sep = "")), xlim = c(0,40), breaks = 30, freq = F)
lines(density(sampleChiSq), col = "red", lwd = 2)

On the other had, the two dimesional Wishart distribution (a stochastic distribution of random symmetric matrices of the type \(2 \times 2\)) is more challenging to visualize properly (in a useful way). One possibility is to consider, for instance, a plot which represents each \(2 \times 2\) type random matrix as an oriented vector in a two dimensional real plane—thus, a random matrix \(\boldsymbol{X}_i\) is composed of two vectors \((x_{11}, x_{21})^\top\) and \((x_{12}, x_{22})^\top\) (columns of the random matrix).

These two vectors define (in a two dimensional real plane \(\mathbb{R}^2\) or the \(xy\) scatterplot respectively) the starting point and the ending point of some line of a finite length. The starting point of the line is represented by the first vector \((x_{11}, x_{21})^\top\) (red dots in the plot below) while the ending point is represented by the vector \((x_{12}, x_{22})^\top\).

The plot may not be too much intuitive but it allows for a straightforward representation of the raw random values—random matrices in this case.

library(MASS)
sample2DWishart <- rWishart(100, df = 10, cbind(c(1,1), c(1,3)))

plot(0,0, pch = "", xlim = c(min(sample2DWishart[1,,]), max(sample2DWishart[1,,])), ylim = c(min(sample2DWishart[2,,]), max(sample2DWishart[2,,])),
     xlab = "First column vector", ylab = "Second column vector")

contour(kde2d(sample2DWishart[1,1,], sample2DWishart[2,1,], n=10), drawlabels=FALSE, nlevels=10, col="gray53", add=TRUE)
contour(kde2d(sample2DWishart[1,2,], sample2DWishart[2,2,], n=10), drawlabels=FALSE, nlevels=10, col="lightpink2", add=TRUE, lty = 3)

for(i in 1:100){
  lines(sample2DWishart[1,,i], sample2DWishart[2,,i], col = "red")
  points(sample2DWishart[1,1,i], sample2DWishart[2,1,i], pch = 21, bg = "red", cex = 0.8)
}

Other options to visualise a random sample from the Wishart distribution are, for instance, based on eigen values of each random matrix (eigen vectors respectively) or some other partial information taken from each random matrix. Such methods however, usually do not allow for a straightforward visualisation/representation of the raw sample points/matrices.

Individual tasks

Try to play with the parameters of the Wishart distribution to see different constelations of the generated random vectors in the plot above. Consider also some other Wishart distribution for some \(p \in \mathbb{N}\) such that \(p > 2\) and generate a random sample from this distribution… try to think of some other visualization options.

Consider some general matrix \(A \in \mathbb{R}^{p \times q}\) for some \(q \in \mathbb{N}\) such that \(p \neq q\) and use some visualization tools to empiricaly verify (for instance, by some histograms, or KDE) that \(A^\top \mathbb{M} A \sim W_{q}(A^\top\Sigma A, n)\), where \(\mathbb{M} \sim W_{p}(\Sigma, n)\);

2. Hotelling’s \(\boldsymbol{T^2}\) Distribution

In a similar manner as we define a classical \(t\)-distribution in a univariate case (i.e., a standard normal \(N(0,1)\) random variable devided by a square root of another stochastically independent and \(\chi^2\) distributed random variable normalized by its degrees of freedom) we define a multivariate generalization (the Hotelling’s \(T^{2}\) distribution) as \[ n \boldsymbol{Y}^{\top} \mathbb{M}^{-1} \boldsymbol{Y} \sim T^{2}(n, p), \] where \(p \in \mathbb{N}\) is the dimension of the random vector \(Y \sim N_{p}(0, \mathbb{I})\) and \(n \in \mathbb{N}\) being the parameter of the Wishart distribution for the random matrix \(\mathbb{M} \sim W_{p}(\mathbb{I}, n)\). Moreover, the random vector \(\boldsymbol{Y}\) is assumed to be independent of the random matrix \(\mathbb{M}\).

A special case for \(p = 1\) gives the standard Fisher distribution with one and \(n\) degrees of freedom (equivalently a square of the \(t\)-distribution with \(n\) degrees of freedom). However, the Hotelling’s \(T^2\) distribution with parameters \(p, n \in \mathbb{N}\) is, in general, closely related with the Fisher’s F distribution. It holds, that

\[ T^{2}(p, n) \equiv \frac{n p}{n - p + 1}F_{p, n - p + 1}. \]

Therefore, the standard univariate Fisher’s F distribution can be effectively used to draw critical values also from the Hotelling’s \(T^2\) distribution. The corresponding transformation between the Hotelling’s \(T^2\) distribution and the Fisher’s F distribution only depends on parameters \(n, p \in \mathbb{N}\).

The role of the given parameters and different parameter settings in the Hotelling’s \(T^2\) distribution can be (a little) visualized, for instance, by utilizing the Fisher’s F distribution and the knowledge about its mean and variance:

samples <- NULL
samples <- cbind(samples, rf(n = 1000, df1 = 1, df2 = 1))
samples <- cbind(samples, rf(n = 1000, df1 = 1, df2 = 10))

samples <- cbind(samples, rf(n = 1000, df1 = 10, df2 = 1))
samples <- cbind(samples, rf(n = 1000, df1 = 100, df2 = 1))

samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 1))
samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 10))

samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 100))
samples <- cbind(samples, rf(n = 1000, df1 = 1000, df2 = 1000))

par(mfrow = c(4,2))
for (i in 1:dim(samples)[2]){
  hist(samples[,i], xlab=paste("Sample no.", i, sep = ""), col = "yellow", main = "", freq = F, breaks = 50)
  lines(density(samples[,i]), col = "red")
}

Recall, that for some random variable \(X\) with the Fisher’s F distribution \(F_{df_1, df_2}\) the following equations for the moment properties hold:

expected value \(E X = \frac{df_{2}}{df_{2} - 2}\), for \(df_2 > 2\);
variance \(Var(X) = \frac{2 df_{2}^2 (df_1 + df_2 - 2)}{df_1 (df_{2} - 2)^2 (df_2 - 4)}\) for \(df_2> 4\);

Moreover, using the relationship between the Hotelling’s \(T^2\) distribution and the Fisher’s F distriubtion we can effectively use the \(F\)-distribution to draw the corresponding critical values when needed for some statistical tests.

For some multivariate random sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}, \Sigma)\) for some unknown mean vector \(\boldsymbol{\mu} \in \mathbb{R}^{p}\) and some variance-covariance matrix \(\Sigma\), we have that \[ (n - 1)\Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \sim T^2(p, n - 1), \] which can be also equivalently expressed as \[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \sim F_{p, n - p}. \] This can be now used to construct confidence regions for the unknown mean vector \(\boldsymbol{\mu}\) for testing hypothesis about the true value of the vector of parameters \(\boldsymbol{\mu} = (\mu_{1}, \dots, \mu_{p})^{\top}\). However, rather than constructing a confidence region for \(\boldsymbol{\mu}\) (which can be impractical in higher dimensions for even slightly larger values of \(p\)) one focusses on construction confidence intervals for the elements of \(\boldsymbol{\mu}\) such that the mutual coverage is under control (usually we require a simultaneous coverage of \((1 - \alpha)\times 100~\%\) for some small \(\alpha \in (0,1)\)).

For a hypothesis test

\[ H_{0}: \boldsymbol{\mu} = \boldsymbol{\mu}_{0} \in \mathbb{R}^{p} \]

\[ H_{1}: \boldsymbol{\mu} \neq \boldsymbol{\mu}_{0} \in \mathbb{R}^{p} \]

we can use the following test statistic:

\[ (n - 1)\Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big), \]

which, under the null hypothesis, follows the \(T^2(p, n - 1)\) distribution. Equivalently we also have that

\[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big) \]

follows the Fisher \(F\)-distribution with \(p\) and \(n - p\) degrees of freedom.

In the R software we can use the library DescTools (the library needs to be installed by running the command install.packages("DescTools")) which is loaded into the R environment by the command

library(DescTools)

Now we can use the function HotellingsT2Test() to perform the test on multivariate mean vector (use the help session to find out how the function works).

Using the same approach we can also construct the confidence elipsoid for \(\boldsymbol{\mu} \in \mathbb{R}^p\) - it holds that \[ \frac{n - p}{p} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big)^\top \mathcal{S}^{-1} \Big(\overline{\boldsymbol{X}} - \boldsymbol{\mu}_{0}\Big) \sim F_{p, n - p}, \] and therefore, the following set \[ \left\{\boldsymbol{\mu} \in \mathbb{R}^p;~ \Big( \overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big)^\top \mathcal{S}^{-1} \Big( \overline{\boldsymbol{X}} - \boldsymbol{\mu}\Big) \leq \frac{p}{n - p} F_{p, n- p}(1 - \alpha) \right\} \] is a confidence region (the interior of an iso-distance ellipsoid in \(\mathbb{R}^p\)) for the mean vector o\(\boldsymbol{\mu} \in \mathbb{R}^p\) with the confidence level of \(1 - \alpha\) for some alpha \(\alpha \in (0,1)\).

A brief example on how it works (example from the lecture notes):

library(mvtnorm)
s=matrix(c(1,-0.5,-0.5,1),2);x=seq(-3,3,by=0.015) ### variance-covariance matrix

contour(x,x,outer(x,x, function(x,y){dmvnorm(cbind(x,y),sigma=s)}))
n <- 20
X <- rmvnorm(n,sigma=s)
m <- apply(X,2,mean)
S <- cov(X)
points(m[1],m[2],pch=8,col="red",cex=2)

S1=solve(S)
contour(x,x,outer(x,x,function(x,y){(n-2)*
                                      apply(t(t(cbind(x,y))-m),1,function(x){t(x)%*%S1%*%x})<
                                      2*qf(0.95,2,n-2)}),col="red",add=TRUE)
bodyx=m[1]+c(-1,1)*sqrt(S[1,1]*2*qf (0.95,2,n-2) /(n-2))
bodyy=m[2]+c(-1,1)*sqrt(S[2,2]*2*qf (0.95,2,n-2) /(n-2))
polygon(x=bodyx[c(1,1,2,2,1)],y=bodyy[c(1,2,2,1,1)],border="blue")

(which is the 95% confidence region the true mean vector)

The same picture for \(n = 200\):

contour(x,x,outer(x,x, function(x,y){dmvnorm(cbind(x,y),sigma=s)}))
n <- 200
X <- rmvnorm(n,sigma=s)
m <- apply(X,2,mean)
S <- cov(X)
points(m[1],m[2],pch=8,col="red",cex=2)

S1=solve(S)
contour(x,x,outer(x,x,function(x,y){(n-2)*
                                      apply(t(t(cbind(x,y))-m),1,function(x){t(x)%*%S1%*%x})<
                                      2*qf(0.95,2,n-2)}),col="red",add=TRUE)
bodyx=m[1]+c(-1,1)*sqrt(S[1,1]*2*qf (0.95,2,n-2) /(n-2))
bodyy=m[2]+c(-1,1)*sqrt(S[2,2]*2*qf (0.95,2,n-2) /(n-2))
polygon(x=bodyx[c(1,1,2,2,1)],y=bodyy[c(1,2,2,1,1)],border="blue")

Example 1: Difference of two multivariate means with the same (unknown) variance-covariance matrix

In an analogous way one can also construct a two sample Hotelling test to compare two population means.

We assume a multivariate sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}_{1}, \Sigma)\) and some another sample \(\boldsymbol{Y}_{1}, \dots, \boldsymbol{Y}_{} \sim N_{p}(\boldsymbol{\mu}_{2}, \Sigma)\), with some generally different mean parameter \(\boldsymbol{\mu}_{1} \neq \boldsymbol{\mu}_{2}\). We are interested in testing the null hypothesis \[ H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_{2} \] against the alternative hypothesis, that the null does not hold. We have that the following holds: \[ (\overline{X}_{n} - \overline{Y}_{m}) \sim N_{p}\Bigg(\boldsymbol{\Delta}, \frac{n + m}{n m} \Sigma\Bigg), \] for \(\boldsymbol{\Delta} = \boldsymbol{\mu}_1 - \boldsymbol{\mu}_2 \in \mathbb{R}^p\). Similarly, it also holds that \[ n\mathcal{S}_{1} + m\mathcal{S}_{2} \sim W_{p}(\Sigma, n + m - 2), \] where \(n \mathcal{S}_{1} = \mathbb{X}^\top \mathcal{H}\mathbb{X} \sim W_p(\Sigma, n - 1)\) and \(m \mathcal{S}_{2} = \mathbb{Y}^\top \mathcal{H}\mathbb{Y} \sim W_p(\Sigma, m - 1)\) respectively are the (rescaled) empirical estimates of the variance-covariance matrix \(\Sigma\), given the first sample (data matrix) \(\mathbb{X} = (\boldsymbol{X}_1, \dots, \boldsymbol{X}_n)^\top\) and the second sample (data matrix) \(\mathbb{Y} = (\boldsymbol{Y}_1, \dots, \boldsymbol{Y}_m)^\top\) respectively. The distribution of the sum \(n\mathcal{S}_{1} + m\mathcal{S}_{2}\) above follows by the generic property of the Wishart distribution (similar to the generic property for the standard \(\chi^2\) distribution where for \(X \sim \chi^2_{n_1}\) and \(Y \sim \chi^2_{n_2}\) stochastically independent, it holds that \(X + Y \sim \chi^2_{n_1 + n_2}\)).

Then the rejection region is defined as \[ \frac{nm(n + m - p - 1)}{p(n + m)^2}(\overline{\boldsymbol{X}}_{n} - \overline{\boldsymbol{Y}}_{m})^{\top} \mathcal{S}^{-1} (\overline{\boldsymbol{X}}_{n} - \overline{\boldsymbol{Y}}_{m}) \geq F_{p, n + m - p - 1}(1 - \alpha). \] for the weighted average \(\mathcal{S} = (n + m)^{-1} \cdot (n \mathcal{S}_1 + m \mathcal{S}_2)\).

This all follows from the definition of the Hotelling \(T^2\) distribution given above above where we need to firstly define the following quantities:

Normally distributed random vector with \(N_{p}(\boldsymbol{0}, \mathbb{I})\):
\[ \boldsymbol{Y} = \Big(\frac{nm}{n + m}\Big)^{1/2} \cdot \Sigma^{-1/2} (\overline{\boldsymbol{X}}_n - \overline{\boldsymbol{Y}}_m) \sim N_{p}(\boldsymbol{0}, \mathbb{I}) \]
Random matrix with the Wishart distribution \(W_p(\mathbb{I}, n + m - 2)\): \[ \mathbb{M} = \Sigma^{-1/2} (n \mathcal{S}_1 + m \mathcal{S}_2) \Sigma^{-1/2} \sim W_p\Big(\Sigma^{-1/2}(\Sigma^{1/2} \Sigma^{1/2})\Sigma^{-1/2}, n + m - 2\Big) \equiv W_p(\mathbb{I}, n + m - 2) \]
Now, following the definition of the Hotelling \(T^2\) distribution given by the exression \[ (n + m - 2) \boldsymbol{Y}^\top\mathbb{M}^{-1}\boldsymbol{Y} \sim T^2(n + m - 2, p) \] we obtain (using the notation for \(\mathcal{S}\) and the relationship betweeen the Hotelling distribution and the Fisher distribution) the expression provided above.

Example 2: Difference of two multivariate means with unequal (known) variance-covariance matrix

Now, we assume a multivariate sample \(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_{n} \sim N_{p}(\boldsymbol{\mu}_{1}, \Sigma_{1})\) and some another sample \(\boldsymbol{Y}_{1}, \dots, \boldsymbol{Y}_{M} \sim N_{p}(\boldsymbol{\mu}_{2}, \Sigma_{2})\), with some generally different mean parameter \(\boldsymbol{\mu}_{1} \neq \boldsymbol{\mu}_{2}\). We are interested in testing the null hypothesis \[ H_{0}: \boldsymbol{\mu}_1 = \boldsymbol{\mu}_{2} \] against the alternative hypothesis, that the null does not hold. Again, we have that the following holds: \[ (\overline{X}_{1} - \overline{X}_{2}) \sim N_{p}\Bigg(\boldsymbol{\Delta}, \frac{\Sigma_{1}}{n} + \frac{\Sigma_{2}}{m}\Bigg), \] and therefore, it also holds that \[ (\overline{X}_{1} - \overline{X}_{2})^\top \Big(\frac{\Sigma_{1}}{n} + \frac{\Sigma_{2}}{m}\Big)^{-1} (\overline{X}_{1} - \overline{X}_{2}) \sim \chi_{p}^{2}. \]

Individual tasks

download and install the R library Hotelling and DescTools;
for one sample and two sample Hotelling tests use the command HotellingsT2Test() from the DescTools package;
for two sample Hotelling tests you can also use the command hotelling.test() from the Hotelling package;
use the help session to find out more about both commands, hotelling.stat() and hotelling.test();

library("Hotelling")

There is a dataset called container.df available in the package ‘Hotelling’. The data represent some concentration measurements on nine different elements (see the help session ?container.df for further details).

data(container.df)
summary(container.df)

##        gp            Ti                Al               Fe         
##  Min.   :1.0   Min.   :0.02800   Min.   :0.6590   Min.   :0.02700  
##  1st Qu.:1.0   1st Qu.:0.03175   1st Qu.:0.7405   1st Qu.:0.02975  
##  Median :1.5   Median :0.03300   Median :0.7850   Median :0.06950  
##  Mean   :1.5   Mean   :0.03350   Mean   :0.8047   Mean   :0.07665  
##  3rd Qu.:2.0   3rd Qu.:0.03700   3rd Qu.:0.8730   3rd Qu.:0.12200  
##  Max.   :2.0   Max.   :0.03900   Max.   :0.9410   Max.   :0.16800  
##        Mn               Mg               Ca              Ba         
##  Min.   :0.0020   Min.   :0.0900   Min.   :5.511   Min.   :0.00900  
##  1st Qu.:0.0020   1st Qu.:0.1130   1st Qu.:6.422   1st Qu.:0.01200  
##  Median :0.0030   Median :0.1595   Median :6.989   Median :0.01550  
##  Mean   :0.0034   Mean   :0.1759   Mean   :7.001   Mean   :0.01615  
##  3rd Qu.:0.0050   3rd Qu.:0.2362   3rd Qu.:7.578   3rd Qu.:0.02000  
##  Max.   :0.0050   Max.   :0.2700   Max.   :7.780   Max.   :0.02300  
##        Sr               Zr         
##  Min.   :0.0130   Min.   :0.00900  
##  1st Qu.:0.0150   1st Qu.:0.01000  
##  Median :0.0160   Median :0.01150  
##  Mean   :0.0162   Mean   :0.01125  
##  3rd Qu.:0.0170   3rd Qu.:0.01200  
##  Max.   :0.0190   Max.   :0.01400

The estimates for the mean vector and variance-covariance matrix can be obtained as

apply(container.df[,2:10], 2, mean) ### mean vector estimate

##      Ti      Al      Fe      Mn      Mg      Ca      Ba      Sr      Zr 
## 0.03350 0.80470 0.07665 0.00340 0.17590 7.00055 0.01615 0.01620 0.01125

cov(container.df[,2:10])            ### variance-covariance estimate

##               Ti           Al            Fe            Mn            Mg
## Ti  1.194737e-05 2.289474e-04  1.527632e-04  4.421053e-06  1.975789e-04
## Al  2.289474e-04 7.775589e-03  3.868363e-03  1.112842e-04  5.884863e-03
## Fe  1.527632e-04 3.868363e-03  2.465608e-03  7.072632e-05  3.294963e-03
## Mn  4.421053e-06 1.112842e-04  7.072632e-05  2.147368e-06  9.583158e-05
## Mg  1.975789e-04 5.884863e-03  3.294963e-03  9.583158e-05  4.784832e-03
## Ca  6.710526e-06 2.654805e-03  2.015711e-04  1.297895e-05 -4.767895e-05
## Ba  1.355263e-05 3.768895e-04  2.209500e-04  6.410526e-06  3.091211e-04
## Sr -2.105263e-07 1.484211e-06 -1.876842e-05 -6.631579e-07 -1.371579e-05
## Zr  1.921053e-06 6.907895e-05  2.451316e-05  6.842105e-07  4.423684e-05
##               Ca            Ba            Sr            Zr
## Ti  6.710526e-06  1.355263e-05 -2.105263e-07  1.921053e-06
## Al  2.654805e-03  3.768895e-04  1.484211e-06  6.907895e-05
## Fe  2.015711e-04  2.209500e-04 -1.876842e-05  2.451316e-05
## Mn  1.297895e-05  6.410526e-06 -6.631579e-07  6.842105e-07
## Mg -4.767895e-05  3.091211e-04 -1.371579e-05  4.423684e-05
## Ca  4.574089e-01  5.564921e-04 -3.695368e-04 -1.588158e-05
## Ba  5.564921e-04  2.139737e-05 -1.821053e-06  2.644737e-06
## Sr -3.695368e-04 -1.821053e-06  2.484211e-06  1.210526e-06
## Zr -1.588158e-05  2.644737e-06  1.210526e-06  1.671053e-06

and we would like to perform a statistical test to find out whether the mean vector of the concentrations in the first container (gp == 1) and the second container (gp == 2) is the same or not. Try to perform a one sample Hotelling test to test whether the unknown vector of mean parameters equals some prespecified vector \(\boldsymbol{\mu}_{0} \in \mathbb{R}^{p}\).

For a two sample problem we can use the first covariate in the data to define two populations and we provide a test whether the mean vectors are equal or not. The corresponding Hotelling test statistic equals

split.data = split(container.df[,-1],container.df$gp)
x <- split.data[[1]]
y <- split.data[[2]]
statistic <- hotelling.stat(x, y)

and the corresponding test is performed by

hotellingTest <- hotelling.test(.~gp, data = container.df)
hotellingTest

## Test stat:  2043 
## Numerator df:  9 
## Denominator df:  10 
## P-value:  4.233e-09

Questions

What is the interpretation of the test results?
Try to perform a simple two sample \(t\)-test(s). What are the conclussions now?
What would be the corresponding simultaneous confidence intervales for the elements of the mean vector in this example?

The correponding simultaneous confidence intervals for all possible linear combinations \(\boldsymbol{a}^\top \boldsymbol{\mu}\) of the mean vector \(\boldsymbol{\mu} \in \mathbb{R}^{p}\) are given as

\[ P\Big(\forall \boldsymbol{a} \in \mathbb{R}^{p};~ \boldsymbol{a}^\top \boldsymbol{\mu} \in \big( \boldsymbol{a}^\top\overline{\boldsymbol{X}} - \sqrt{K_{\alpha} \boldsymbol{a}^\top \mathcal{S} \boldsymbol{a}}, \boldsymbol{a}^\top\overline{\boldsymbol{X}} + \sqrt{K_{\alpha} \boldsymbol{a}^\top \mathcal{S} \boldsymbol{a}} \big) \big)\Big) = 1 - \alpha, \] where \(K_{\alpha}\) is the corresponding quantile obtained from the Fisher \(F\) distribution in the form \(K_{\alpha} = \frac{p}{n - p} F_{p, n - p}(1 - \alpha)\) and \(\mathcal{S}\) is the sample variance-covariance matrix.

Fquant <- (9/(20 - 9)) * qf(1 - 0.05, 9, 11)
meanVec <- apply(container.df[,2:10], 2, mean)
LB <- diag(9) %*% meanVec - sqrt(Fquant * diag(diag(9) %*% cov(container.df[,2:10])  %*% diag(9)))
UB <- diag(9) %*% meanVec + sqrt(Fquant * diag(diag(9) %*% cov(container.df[,2:10])  %*% diag(9)))

confInt <- data.frame(LB, UB, meanVec,  row.names = names(container.df)[2:10])
names(confInt) <- c("Lower Boundary", "Upper Boundary", "Mean Estimate")

confInt

##    Lower Boundary Upper Boundary Mean Estimate
## Ti   0.0281791989    0.038820801       0.03350
## Al   0.6689600906    0.940439909       0.80470
## Fe   0.0002131292    0.153086871       0.07665
## Mn   0.0011442333    0.005655767       0.00340
## Mg   0.0694184851    0.282381515       0.17590
## Ca   5.9594482060    8.041651794       7.00055
## Ba   0.0090293265    0.023270674       0.01615
## Sr   0.0137737525    0.018626247       0.01620
## Zr   0.0092600784    0.013239922       0.01125

3. Wilk’s Lambda Distribution

The Wilks’s lambda distribution is defined from two independent Wishart distributed random matrices. The Wilk’s lambda distribution is a multivariate analogue for the univariate Fisher’s F distribution and it is used for inference on two variance-covariace matrices (in a similar way as the Fisher’s F distribution is used to perform an inference about two variacnce parameters).

For two independent random matrices \[ \mathbb{A} \sim W_{p}(\mathbb{I}, n) \quad \textrm{and} \quad \mathbb{B} \sim W_{p}(\mathbb{I}, m) \]

we define a random variable \(\frac{|\mathbb{A}|}{|\mathbb{A} + \mathbb{B}|}\) which is said to follow the Wilk’s Lambda distribution \(\Lambda(p, n, m)\). This distribution commnonly appears in the context of likelihood-ratio tests where \(n\) is typically the error degrees of freedom, and \(m\) is the hypothesis degrees of freedom, so that \(n+m\) is the total degrees of freedom.

In a similar way as we perform the analysis of variance for univariate linear regression (ANOVA) we can use a multivariate approach – so called, multivariate analysis of variance (MANOVA). The corresponding inference in MANOVA (\(p\)-values of the tests) are based on the Wilk’s lambda distribution (see the standard R command manova() for more details).

Individual tasks

Use some simulations to obtain some usefull insight into the Wilks Lambda distribution.
(for instance, use the following R code and try to modify it to obtain more complex results)

set.seed(1234)
I <- cbind(c(1,0,0), c(0,1,0), c(0,0,1))
A <- rWishart(100, df = 10, I)
B <- rWishart(100, df = 4, I)

WilksLambda <- apply(A, 3, det)/apply(A + B, 3, det)

hist(WilksLambda, col = "lightblue", freq = F, xlab = "Wilk's Lambda", ylab = "Density", ylim = c(0, 4.5))
lines(density(WilksLambda), col = "red", lwd = 2)

Use the univariate case (Wishart’s distributions for \(p = 1\) and verify (empirically), that the Wilk’s lambda distribution corresponds with the Fisher’s F distribution.

Homework Assignment

(Deadline: fifth lab session / 15.03.2022)

In the R library SMSdata (from the lecturer web site) there are different datasets available there. Apply one or two sample Hotelling \(T^2\) test to compare two theoretical mean vectors.

Use some intuitive motivation for the null and alternative hypothesis (a proper figure, sample characteristics, etc.);
Perform a formal statistical test and interpret the test result. The interpretation must be given in a way that even non-mathematicians/non-statisticians will understand;
Provide some inferential tool for the tested difference (e.g., some confidence region, or a set of confidence intervals);
Provide the solution on you website by Tuesday, 15.03.2022, 10:00 at latest.