NMST539 | Lab Session 2

Conditional and marginal distributions of random vectors

Summer Term 2021/2022 | 22/02/22

sourse file (UTF8 encoding)

Outline of the second NMST539 lab session:

matrix eigen decomposition, eigen values, and eigen vectors;
random vectors: conditional and marginal distributions;
multivariate normal distribution with R packages;

The first task of the lab session today is to recall some important matrix properties (e.g., eigen value decomposition, eigen values properties, eigen vectors) crutial for most of the multivariate statistical techniques discussed later (during the term). Some teoretical/practical illustrations will be used and tge multivariate normal distribution and its specific implementation in R will be discussed as well.

The R-software is available for download (GNU licence, free of charge) from the website: https://www.r-project.org

A user-friendly interface (one of many) however, not nesessary for a proper functioning of the R software: RStudio.

Some (rather brief) manuals and introduction tutorials for R (in Czech or English):

Bína, V., Komárek, A. a Komárková, L.: Jak na jazyk R. (PDF súbor)
Komárek, A.: Základy práce s R. (PDF súbor)
Kulich, M.: Velmi stručný úvod do R. (PDF súbor)
De Vries, A. a Meys, J.: R for Dummies. (ISBN-13: 978-1119055808)

Theoretical bibliography:

Hardle, W. and Simar.L.: Applied Multivariate Statistical Analysis. Springer, 2015
Mardia, K., Kent, J., and Bibby, J.:Multivariate Analysis, Academic Press, 1979.

1. Eigen values and eigen vectors

Multivariate statistical methods are mostly based on some proper matrix decomposition. Usually we are either dealing with a decomposition of a rectangular data matrix – a data frame of a type \(n \times p\) or, instead, a square type matrix (\(p \times p\)) such as, for instnace, a theoretical variance-covariance matrix, its sample couterpart respectively, or the precision/distance matrix.

The most important decomposition is the eigen value matrix decomposition (i.e., eigendecomposition or a spectral decomposition) of a square type matrix into a set of so called eigen vectors and the correponding eigen values.

Nonzero vector \(\boldsymbol{v} \in \mathbb{R}^p\) is said to be an eigen vector of the matrix \(\boldsymbol{A} \in \mathbb{R}^{p \times p}\) if \[ \boldsymbol{A}\boldsymbol{v} = \lambda \boldsymbol{v}, \] for \(\lambda \in \mathbb{R}\) being some fixed number – the eigen value of the matrix \(\boldsymbol{A}\) corresponding to the eigen vector \(\boldsymbol{v} \in \mathbb{R}\).

Interesting illustration of the role of the eigen vectors and eigen values can be fond here:
http://setosa.io/ev/eigenvectors-and-eigenvalues/
Some rather brief and basic information can be also found, for instance, here:
https://personal.utdallas.edu/~Herve/Abdi-EVD2007-pretty.pdf

Individual task

Recall some important matrix properties and the corresponding facts about the eigen values and eigen vectors.
Try to make your own R code which will be able to find the corresponding eigen values of a simple \(2 \times 2\) or \(3 \times 3\) square (symmetric) matrix.
The spectral decomposition of a square matrix (of a type \(p \times p\)) in R (eigendecompostion - EIV) is implemented within a function eigen(). Use the help and learn how the function is supposed to be applied to find EIV decomposition of some real (square) matrix.
The EIV decomposition of a square matrix \(\boldsymbol{A} \in \mathbb{R}^{p \times p}\) is generally defined as \[ \boldsymbol{A} = \boldsymbol{Q}\boldsymbol{\Lambda}\boldsymbol{Q}^{-1} \] where \(\boldsymbol{\Lambda} \in \mathbb{R}^{p \times p}\) is a diagonal matrix with the corresponding eigen values on its diagonal. Matrix \(\boldsymbol{Q} \in \mathbb{R}^{p \times p}\) contains the eigen vectors (as columns).

2. Random vectors (join, marginal, conditional distributions)

Multivariate statistical methods are based on the stochastic theory of random vectors (random matrices respecticely). In this part we use a few illustrative (theoretical and practical) examples to highlight some crucial properties. The examples will be also used to illustrate some important approaches used for simulating multivariate random samples (a general algorithm applicable not only for the R programming).

Theoretical and practical examples

Consider a uniform (two dimensional) distribution of a random vector \((X,Y)^\top\) over the set \(M = \{(x, y) \in \mathbb{R}^2;~0 < x < y < 1\}\).

\(f_{(X,Y)}(x,y)\)

\((X, Y)^\top\)

\(f_X(x)\)

\(f_Y(y)\)

\(X\)

\(Y\)

\((X, Y)^\top\)

Use the expressions above to simulate a two dimensional random sample \((X_1, Y_1)^\top, \dots, (X_n, Y_n)^\top\) from the joint distribution given by the two-dimensional density function \(f_{(X,Y)}(x,y)\) (from the previous example). Use some proper graphical tools to verify that the simulated random sample indeed follows the given joint distribution. Calculate important sample characteristics (sample means and the sample variance-covariance matrix).

Practical hint
- For an arbitrary random variable \(X\) with the distribution function \(F_{X}(x)\) we have that \[ P[X \leq x] = F_{X}(x), \] for all \(x \in \mathbb{R}\). If, in addtion, the distribution function is absolutely continuous, then for the quantile function \(F_X^{-1}(x)\) it holds that it is obtained as an inverse function of the distribution function \(F_{X}(x)\). Thus, for the random variable \(Z = F_X(X)\) it holds (following the definition of the distribution function and quantile function), that \[ P[Z \leq z] = P[F_X(X) \leq z] = P[X \leq F_X^{-1}(z)] = F_{X}\Big( F_X^{-1}(z)\Big) = z \] for \(z\) from the unit interval \([0,1]\). This means, that the random variable \(Z\) follows a uniform distribution on the interval \([0,1]\).
  This should be enough to generate a random sample basically from an arbitrary distribution (if the distribution function is known).
  An illustrative example is given below.
- Consider a random variable \(X\) with the distribution given by the uniform density function \(f(x) = 3 x^2\) for \(x \in (0,1)\) and \(f(x) = 0\) otherwise. We want to use the R program to simulate a random sample from such distribution – however, without recalling a fact that actually the random variable \(X\) follows the Beta distribution with parameters \(\alpha = 3\) and \(\beta = 1\).
  
  The distribution function of the random variable \(X\) is \(F(x) = x^3\) for \(x \in [0,1]\) (while the distribution function is zero for \(x < 0\) and \(F(x) = 1\) for \(x > 1\)). As far as the distribution function is absolutely continuous we can set the corresponding quantile function \(F^{-1}(q)\), for \(q \in [0,1]\) as the inverse function of \(F(x)\), i.e., \[ F^{-1}(q) = \sqrt[3]{q}. \] for \(q \in [0,1]\). Of course, it holds that \(F^{-1}(0) = 0\) and \(F^{-1}(1) = 1\). Thus, the random variable \(Z = F(X)\), for \(F(x) = x^3\) being the distribution function of \(X\), follows a uniform distribution on the interval \([0,1]\). Therefore, we can use a simple (pseudo)-random generator in R (function runif()) to generate a random sample \(Z_1, \dots, Z_n\) from the uniform distribution on the interval \([0,1]\).
```
n <- 1000
set.seed(1234)

Z <- runif(n)
par(mfrow = c(1,2))
plot(ecdf(Z), main = "Empirical d.f. of Z")
hist(Z, col = "gray", breaks = ceiling(sqrt(n)))
```
  Next, we will use the back-transformation \(X_i = F^{-1}(Z_i)\) in order to obtain the random variables \(X_i\) which will already follow the distribution given by the distribution function \(F_X(x)\), or the density function \(f(x)\) respectively.
```
X <- Z^(1/3)
xSeq <- seq(0,1, length = 1000)
par(mfrow = c(1,2))
plot(ecdf(X), main = "Empirical d.f. of X")
lines(xSeq^3 ~ xSeq, col = "red", lty = 2, lwd = 2)
legend("topleft", legend = c("Empirical d.f.", "Theoretical d.f."), col = c("black", "red"), lty = c(1,2), lwd = c(2,2))
hist(X, freq = F, col = "gray", breaks = ceiling(sqrt(n)))
lines(3 * xSeq^2 ~ xSeq, col = "red", lty = 2, lwd = 2)
legend("topleft", legend = c("Empirical density", "Theoretical density"), col = c("black", "red"), lty = c(1,2), lwd = c(2,2))
```
- Use the previous example for a larger sample size \(n \in \mathbb{N}\) and use some empirical tools to verify that the given approach indeed provides a sample form the right distribution. The number of bins in the histogram figure can be set by an additional parameter breaks = ....
- In order to use the approach described above for generating a two dimensional random sample \((X_i,Y_i)^{\top}\) for \(i = 1, \dots, n\) (from some given joint distribution) it is enough to apply analogous steps separatelly for the random element \(X\) and also for \(Y\) (if these two are stochastically independent) or, instead, it is needed to use the analogous steps for the first element only and, later, to apply analogous steps considering the conditional distribution of the second element given the first (if the variables are stochastically dependent).

Use the two-dimensional Pareto distribution given by the joint density function
\[ f(x, y) = c(x + y - 1)^{-p - 2}, \] for \(x,y > 1\) and \(p > 0\). Find the value of the normalizing constant \(c > 0\) and the explicit formulas for the marginal distributions of both random elements. Derive the conditional distribution of \(X\) given \(Y = y\) and the conditional distribution of \(Y\) given \(X = x\).

Individual task

How would you use the approach described above to visually verify independence/dependence of some given (two) random variables from the finite sample \((X_i, Y_i)^\top\), for \(i = 1, \dots, n\).
A standard \(xy\) scatterplot is usually the most common choise, howver, very often it is not an appropriate tool to judge about dependence/independence of two random variables. See, for instance, the figure below:
```
n <- 100

set.seed(1234)
x <- rexp(n, rate = 1)
y <- rexp(n, rate = 10)
z <- x + rnorm(n, -1, 2)

par(mfrow = c(1,2))
plot(y ~ x, pch = 21, bg = "lightblue")
plot(z ~ x, pch = 21, bg = "lightblue")
```
The first figure shows two independent random variables \(X\) and \(Y\). Despite the fact that there are some clear clusters of points near the zero point \([0,0]\), the random variables \(X\) and \(Y\) are indeed stochastically independent.

On the other hand, the second figure shows the scatterplot for two dependent random variables \(X\) and \(Z\) – however, using just the scatterplot itself, maybe we would have a tendency two consider \(X\) and \(Z\) to be rather independent.

The whole picture, however, changes when we consider different scatterplots – instead of plotting the original data \(X_i\), \(Y_i\), and \(Z_i\) (for \(i = 1, \dots, n\)) we can plot analogous scatterplots for \(\widehat{F}_{X}(X_i)\), \(\widehat{F}_{Y}(Y_i)\), and \(\widehat{F}_{Z}(Z_i)\), where \(F_{X}\), \(F_{Y}\), and \(F_{Z}\) are the corresponding distribution functions of \(X\), \(Y\), and \(Z\).
```
qx <- NULL 
qy <- NULL 
qz <- NULL 

for (i in 1:n){
   qx <- c(qx, which(sort(x) == x[i]))
   qy <- c(qy, which(sort(y) == y[i]))
   qz <- c(qz, which(sort(z) == z[i]))
} 

qx <- qx / n
qy <- qy / n
qz <- qz / n

par(mfrow = c(1,2))
plot(qy ~ qx, pch = 21, bg = "lightblue", xlab = "", ylab = "")
lines(lowess(qy ~ qx, f = 1), col = "red", lwd = 2)
plot(qz ~ qx, pch = 21, bg = "lightblue", xlab = "", ylab = "")
lines(lowess(qz ~ qx, f = 1), col = "red", lwd = 2)
```

3. Multivariate normal distribution

Library mvtnorm

Firstly, we need to make sure that the library for the multivariate normal distribution (R package mvtnorm) is installed and properly loaded in the R working environment (use commad library("mvtnorm")). If the library can not be loaded you need to install it first. The installation is performed fully automatically by typing the command install.packages("mvtnorm") in the R command line.

library("mvtnorm")

We can now use the loaded library to easily generate random values from some multivariate normal distribution with some pre-specified variance-covariance matrix \(\Sigma\) (a symmetric and positive definite matrix).

For more details about the library (including a PDF manual document) see the library website on R cran:
https://cran.r-project.org/web/packages/mvtnorm/index.html

Comment

It is useful to always keep in mind that everytime we are about to use some random generator in R (or any other software) it is good to predefine the initial values for the generator (in R there is a command called set.seed() to do the job). By doing so one makes sure that the obtained results are always all reconstructable and they can be easily verified.

For some specific purposes (master thesis, scientific paper, simulations, etc.) this is even a strictly required step.

For instance try (among each other) the command:

(sample <- rnorm(10))

##  [1]  1.1918528  1.3658102 -0.7606006 -0.9737320  1.9097989 -1.7529179
##  [7]  0.9556914  2.3340616  1.1016724  1.0919802

and compare it with the same command, however, with the initial setting of the generator done by the set.seed(1234) command (any numerical value can be used as an input parameter, however, it must be the same for all):

set.seed(1234)
(sample <- rnorm(10))

##  [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247  0.5060559
##  [7] -0.5747400 -0.5466319 -0.5644520 -0.8900378

Is it clear, what is the difference between these two outputs?
Producing results which are easily reconstructable and can be verified is the key issue here.

Example

Let us start with a simple (two-dimensional) example: we are going to generate a random sample of size \(n \in \mathbb{N}\) from the two-dimensional normal distribution

\(\boldsymbol{X}_{1}, \dots, \boldsymbol{X}_n \sim N_{2}\Big(\boldsymbol{\mu} = \Big(\begin{array}{c}2\\3\end{array}\Big), \Sigma = \Big(\begin{array}{cc}10^2 & 6^2\\ 6^2 & 6^2\end{array}\Big) \Big)\).

for \(\boldsymbol{X}_{i} = (X_{i 1}, X_{i 2})^\top\).

Remember, that in case of a one-dimensional random generator for the normal distribution in R (command rnorm()) one needs to specify the standard error instead of the variance (compare the help sessions ?rnorm and ?rmnorm).

What is the marginal distribution of the elements – random variables \(X_{i 1}\) and \(X_{i 2}\)?
Can we use the standard – one dimensional random generator rnorm() to simulate both variables individually and, later, to combine them into the vector to form the multivariate – two dimensional normal distribution?
Verify by yourself that the variance-covariance matrix \(\Sigma\) is a symmetric and positive definite matrix.

The sample of size \(n=1000\) from the given two-dimensional normal distribution can be generated by the following command:

set.seed(1234)
s1 <- rmvnorm(1000, c(2, 3), matrix(c(10^2, 6^2, 6^2, 6^2),2,2))

And we can plot the generated random sample into a nice scatterplot using, for instance, the standard R command plot():

plot(s1, pch=21, xlim=c(-35, 35), ylim=c(-35,35), xlab="Marginal No. 1", ylab="Marginal No. 2", bg = "lightblue", cex = 0.8)

Using now the mean estimates and the sample variance-covariance matrix we can easily draw some ‘depth’ contours in the figure. Another option of course, is to use the theoretical expression for the mutivariate normal density, as we know the distribution we generated from (which is usually not the case in real life situations). However, in the following, we will rather use the generated sample and we calculate the sample mean vector and the sample variance-covariance matrix.

center <- apply(s1, 2, mean)   ## sample mean vector
sigma <- cov(s1)               ## sample variance-covariance matrix

The two dimensional normal density function is defined as

\(f(\boldsymbol{x}) = f(x_{1}, x_{2}) = \frac{1}{2 \pi |\Sigma|^{1/2}} exp \left\{ -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\boldsymbol{x} - \boldsymbol{\mu}) \right\}\),

where \(\boldsymbol{x} = (x_{1}, x_{2})^\top \in \mathbb{R}^2\), \(\boldsymbol{\mu} = (\mu_{1}, \mu_{2})^\top\) is the mean vector, and \(\Sigma\) is some given variance-covariance matrix (which is symmetric and positive definite).

Thus, we also need the inverse matrix for the sample variance-covariance matrix \(\widehat{\Sigma}\) to be able to draw the corresponding contours. The R fuction ellipse() in the following code is responsible for calculating the value of the two dimensional normal density function when plugging in the appropriate sample estimates and the margin values.

sigma.inv = solve(sigma, matrix(c(1,0,0,1),2,2))

ellipse <- function(s,t){
                    u<-c(s,t)-center; 
                    e <- (-1) * (u %*% sigma.inv %*% u) / 2; 
                    exp(e)/(2 * pi * sqrt(det(sigma.inv)))}

This can be now applied in the following way:

n <- 60
x <- (0:(n-1))*2 - 50
y <- (0:(n-1))*2 - 50

z <- mapply(ellipse, as.vector(rep(x,n)), as.vector(outer(rep(0,n), y, `+`)))

And the final plot is produced by the following couple of the commands:

plot(s1, pch=21, xlim=c(-35, 35), ylim=c(-35,35), xlab="Margin No. 1", ylab="Margin No. 2", bg = "lightgray", cex = 0.8)
contour(x,y,matrix(z,n,n), levels=(0:15), col = terrain.colors(16), add=TRUE, lwd = 1)

Individual task

Try various settings for the parameters defining the two dimensional normal distribution (the mean vector and the variance covariance matrix) used above and try to understand what happens with the distribution of the points, rotation of the structure, and the corresponding countours when changing the values of the given parameters.

Use two different samples each simulated from the two dimensional normal distribution with different parameters and draw the samples with the corresponding countours into one plot only. Again try to use different parameter settings to see what happens.

Try to give a resonable and sensible interpretation (in a way that non-statisticians or non-mathemticians would understant) of the parameters used in the models.

Can you think of some reasonable and meaningful interpretation of the plotted contours in the figure above? Why are they called depth contours? Is there some straightforward relationship with quantiles?

Think of some linear combination of the random elements \(X_{i1}\) and \(X_{i2}\) in the sample you generated from the two dimensional normal distribution. What is the distribution of this linear combination? Use some graphical tools to verify your conclussion.

Library mixtools

Another option which can be used to get the same results is the function predefined in the R package mixtools. Install the library by using install.packages(mixtools) and load the library by running the command

library("mixtools")

Within the mixtools library there is another ellipse() function defined (use the help session ?ellipse to see how it works).

rm(list = ls())
library(mixtools)
library(mvtnorm) 

set.seed(1234)
p <- rmvnorm(1000, c(250000, 20000), matrix(c(100000^2, 22000^2, 22000^2, 6000^2),2,2))
plot(p, pch=20, xlim=c(0,500000), ylim=c(0,50000), xlab="Margin No. 1", ylab="Margin No. 2")
ellipse(mu=colMeans(p), sigma=cov(p), alpha = .05, npoints = 250, col="red")

Individual task

Can you reconstruct the countour from the figure above? How is the red ellipse calculated there?

alpha <- 0.05
angles <- seq(0, 2*pi, length = 200)
cpoints <- cbind(cos(angles), sin(angles))

q <- sqrt(qchisq(1 - alpha, df = 2))
mu <- colMeans(p)
sigma <- cov(p)
sigmaEig <- eigen(sigma)
sigmaSqr <- sigmaEig$vectors %*% diag(sqrt(sigmaEig$values)) %*% t(sigmaEig$vectors)
Epoints <-  q * (cpoints %*% sigmaSqr)
Epoints[,1] <- Epoints[,1] + mu[1]
Epoints[,2] <- Epoints[,2] + mu[2]

plot(p, pch=20, xlim=c(0,500000), ylim=c(0,50000), xlab="Margin No.1", ylab="Margin No.2")
points(Epoints[,1], Epoints[,2], col = "red", pch = "*")

Library MASS

One more option for simulating a general multivariate normal sample is available within the library MASS by running the command mvrnorm() (again, use the help session ?mvrnorm to find out more details about this command). The library needs to be loaded in by the standard command

library("MASS")

Next, we can generate some random samples from two-dimensional normal distribution and, for some visualisation purposes, we calculate the corresponding multivariate kernel density estimates (theoretical details are omitted and they are not important).

### two independent samples from two-dimensional normal each
bivn1 <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1, .5, .5, 1), 2))
bivn2 <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1.5, 1.5, 1.5, 1.5), 2))

### two dimensional kernel density estimate
bivn.kde1 <- kde2d(bivn1[,1], bivn1[,2], n = 50)
bivn.kde2 <- kde2d(bivn2[,1], bivn2[,2], n = 50)

par(mfrow = c(1,2))
persp(bivn.kde1, phi = 45, theta = 30, shade = .1, border = NA)
#par(new=TRUE)
persp(bivn.kde2, phi = 45, theta = 30, shade = .1, border = NA)

Comment

Remmeber, that once we know the joint distribution function all marginals are easily obtainable – one just need to calculate some multivariate integrals which is usually easily doable. On the other hand, however, if we know all marginals it is still not enough to reconstruct the whole joint (multivariate) distribution. Using this idea we can use the next principle to generate some multivariate normal distribution BUT we can not control for the overal dependence structure – the covariance matrix among the marginals.

library(ggplot2)

rm(list = ls())
set.seed(1234)

n <- 1000
sample1 <- rnorm(n, mean=2, sd = 2)
sample2 <- 0.2 * sample1 + rnorm(n)
data <- data.frame(x = sample1, y = sample2, group="A")

sample3 <- rnorm(n, mean=0, sd = 1)
sample4 <- 3*sample3 - 1 + rnorm(n)
data <- rbind(data, data.frame(x = sample3, y = sample4, group="B"))

ggplot(data, aes(x=x, y=y,colour=group)) + geom_point(size=1.5, alpha=.6)

Checking independence by using the distribution functions:

xPoints <- pnorm(sample1, mean(sample1), sd(sample1))
yPoints <- pnorm(sample2, mean(sample2), sd(sample2))

plot(xPoints, yPoints, pch = 21, bg = "lightblue")
lines(lowess(yPoints ~ xPoints), col = "red", lwd = 2)

Which can be compared with the figure for plotting sample no.1 against the sample no.3 (two independent samples).

xPoints <- pnorm(sample1, mean(sample1), sd(sample1))
yPoints <- pnorm(sample3, mean(sample3), sd(sample3))

plot(xPoints, yPoints, pch = 21, bg = "lightblue")
lines(lowess(yPoints ~ xPoints), col = "red", lwd = 2)

Comments & Questions

What do you expect to see in the two figures displayed above?
Do you understand the idea of ploting the values of the distribution functions evaluated at the random samples against each other?
Which distributions can be used to this approach?
Why is it not a good idea to plot directly the values of the sample against each other?

Individual task assignment

(Deadline: Third lab session / 01.03.2022)

In the second section ‘’Random vectors (join, marginal, conditional distributions)’’ solve Example 1 (theoretical) and use the R software to simulate the random sample in Example 2 (practical).

Place the solution of both examples on you website, onTuesday, 01.03.2022, at 12:00 at lastest.