Bayesian Methods - JAGS - Logistic regression with random effects

The solution to this problem will be shown at exercise classes, hence, no report is required from students.

Solution as R script: R
Solution as Rmd file: Rmd

This exercise we will demonstrate how to sample an MCMC via Just Another Gibbs Sampler, download it from JAGS. Also, install library(runjags) to be able to rung JAGS from R. The capabilities of the latest version of JAGS are described in the following manual.

Data description

The data file toenail.txt (values separated by spaces) comes from a longitudinal dermatological clinical study, whose main objective was to compare the efficacy of two treatments of toenail infection.

# Setting up the directories (change depending on your structure)
WD <- getwd()
DATA <- file.path(WD, "data")
FIG <- file.path(WD, "fig")
# Loading the data fro directory DATA
toenail <- read.csv(file.path(DATA, "toenail.txt"), sep="")
head(toenail, 20)

It contains the following variables:

idnr - identification number of a patient;
infect - indicator of severity of infection (0 \(=\) no or weak infection, 1 \(=\) medium or severe infection);
trt - treatment indicator (0 \(=\) treatment A, 1 \(=\) treatment B);
time - time of a visit (months);
visit - number of a visit.

Logistic regression with random effects

Let \(Y_{i,j}\) denote a random variable representing the indicator of the infection severity for \(i\)th patient at the \(j\)th visit (\(i=1,\ldots,n\), \(j=1,\ldots,n_{i}\)) which was conducted at time \(t_{i,j}\) of months. Let \(x_{i,j}\in\{0,\,1\}\) denote the treatment indicator of the \(i\)th patient.

Assume the following (hierarchical) model (genuine parameters and regressors are not indicated in the conditions when specifying the distributions): \[\begin{alignat*}{2} B_i &\sim \mathsf{N}(\beta_0,\,\tau_0^{-1}), &\quad &i=1,\ldots,n,\\[1ex] Y_{i,j}\,|\,B_i &\sim \mathcal{A}\Bigl(\pi(B_i)\Bigr), &\quad &i=1,\ldots,n,\;j=1,\ldots,n_i,\\[1ex] \log\biggl\{\frac{\pi(B_i)}{1 - \pi(B_i)}\biggr\} &= B_i + \beta_1\,x_i + \beta_2\,t_{i,j} + \beta_3\,x_i\,t_{i,j}, &\quad &i=1,\ldots,n,\;j=1,\ldots,n_i. \end{alignat*}\] In a non-Bayesian terminology this would be the logistic regression with a random intercept. As primary () parameters, consider the following: \[\begin{equation*} \boldsymbol{\psi} = \left(\boldsymbol{\beta}^\top,\,\tau_0\right)^\top,\qquad \boldsymbol{\beta} = \left(\beta_0,\,\beta_1,\,\beta_2,\,\beta_3\right)^\top. \end{equation*}\] Assume the following prior distribution for the primary parameters: \[\begin{gather*} p( \boldsymbol{\beta},\,\tau_0) = p( \boldsymbol{\beta})\,p(\tau_0), \\[1ex] \boldsymbol{\beta} \sim \mathsf{N}_4\bigl(\boldsymbol{0},\,\mbox{diag}(10^2,\ldots,10^2)\bigl), \;\;\tau_0 \sim \Gamma(1,\,0.005). \end{gather*}\] As latent data consider the random effects values \(\mathbf{B} = \left(B_1,\ldots,B_n\right)^\top\).

Full assignment

Derive (just by hand on a paper) full conditional densities (just the core of the density known up to a multiplicative constant) to implement a Gibbs algorithm that would generate in individual steps

\(\boldsymbol{\beta}\) (jointly),
\(\tau_0\),
\(\mathbf{B}\) (jointly).

Next, answer the following questions:

Does any of the derived densities correspond to any of named distributions? That is, is it easy to determine the normalizing constant?
Does the Gibbs algorithm differ which generates values of \(B_1,\ldots,B_n\) one by one from the algorithm which generates jointly the complete vector \(\mathbf{B}\)?

Implement the above model in JAGS and using library(rjags) generate two Markov chains whose limit distribution will be the posterior distribution for the model under consideration.
Draw the trajectories (traceplots) for the primary parameters of the model and also for the model deviance. It is necessary to add "deviance" among the monitored parameters. Next, monitor the following variables: "pd", "popt", "dic", "ped". Their meaning will be explained later. Draw both chains in one plot with two different colors. Draw the estimates of the autocorrelation functions (for at least one of the generated chains).

Assess whether the convergence of the Markov chain to the limit distribution can be assumed and whether the chain exhibits acceptable autocorrelation.

Assess whether, given the variability of the posterior distribution for \(\boldsymbol{\beta}\), the prior distribution used for \(\boldsymbol{\beta}\) can be considered as weakly informative.
Calculate the basic characteristics of the posterior distribution for the following parameters:

\(d_0 = \tau_0^{-1/2}\) - standard deviation of random effects.
\(\gamma_1\) - mean slope of the logit of the probability of a medium or strong infection in a group with treatment A. Which function of the primary parameters is it?
\(\gamma_2\) - mean slope of the logit of the probability of a medium or strong infection in a group with treatment B. Which function of the primary parameters is it?
\(\gamma_3\) - parameter which quantifies a difference between the two treatments. Which function of the primary parameters is it?

For above defined parameters \(d_0\), \(\gamma_1\), \(\gamma_2\), \(\gamma_3\), calculate 95% credible intervals (ET as well as HPD) and plot estimates of posterior densities.
For parameter \(\gamma_3\) calculate (using the generated Markov chain) a value \(p\) which satisfies \[\begin{equation*} p = \inf\bigl\{\alpha:\,0\notin C(\alpha)\bigr\}, \end{equation*}\] where \(C(\alpha)\) is the \((1 - \alpha)100\)% ET credible interval for \(\gamma_3\).

Remember that the calculated \(p\) can be interpreted as a P-value of a test of the null hypothesis \(\gamma_3 = 0\).

Task 1 - Full-conditional distributions - preparations for Gibbs algorithm

For the purpose of this task we need to introduce the notation \(\boldsymbol{\beta}_{-0} = (\beta_1, \beta_2, \beta_3)^\top\) which excludes the coefficient used in the prior for random effects. Moreover, \(\mathbf{z}_{ij} = (x_i, t_{ij}, x_i t_{ij})^\top\).

First, we need to write down all the pdfs:

\[\begin{align*} p(\beta_0) &\propto \exp\left\{- \frac{1}{2} \frac{1}{10^2} \beta_0^2\right\}, \\ p(\boldsymbol{\beta}_{-0}) &\propto \exp\left\{- \frac{1}{2} \frac{1}{10^2} \|\boldsymbol{\beta}_{-0}\|^2 \right\}, \\ p(\tau_0) &\propto \tau_0^{1-1} \exp\{-0.005 \tau_0\}, \\ p(\mathbf{B} | \beta_0, \tau_0) &\propto \prod\limits_{i=1}^n \sqrt{\frac{\tau_0}{2\pi}} \exp \left\{- \frac{\tau_0}{2} (B_i - \beta_0)^2 \right\} \propto \tau_0^{\frac{n}{2}} \exp \left\{- \frac{\tau_0}{2} \sum\limits_{i=1}^n (B_i - \beta_0)^2 \right\}, \\ p\left(\mathbb{Y} | \mathbf{B}, \boldsymbol{\beta}_{-0}\right) &\propto \prod\limits_{i=1}^n \prod\limits_{j=1}^{n_i} \left[\sigma(B_i + \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{Y_{ij}} \left[\sigma(-B_i - \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{1 - Y_{ij}}, \end{align*}\] where \(\sigma(z) = \dfrac{1}{1+ \exp(-z)}\) is the sigmoid function (inverse function to logit).

The full-posterior is by Bayes theorem proportional to \[ p(\mathbf{B}, \boldsymbol{\beta}, \tau_0 | \mathbb{Y}) \,\propto\, p\left(\mathbb{Y} | \mathbf{B}, \boldsymbol{\beta}_{-0}\right) \, p(\mathbf{B} | \beta_0, \tau_0) \, p(\tau_0) \, p(\boldsymbol{\beta}_{-0}) \, p(\beta_0) \] and so are the full-conditional distributions: \[\begin{align*} p(\beta_0 | \cdots) &\propto \exp\left\{- \frac{1}{2} \left[\frac{1}{10^2} \beta_0^2 + \tau_0 \sum\limits_{i=1}^n (B_i - \beta_0)^2\right]\right\}, \\ p(\boldsymbol{\beta}_{-0} | \cdots) &\propto \exp\left\{- \frac{1}{2} \frac{1}{10^2} \|\boldsymbol{\beta}_{-0}\|^2 \right\} \prod\limits_{i=1}^n \prod\limits_{j=1}^{n_i} \left[\sigma(B_i + \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{Y_{ij}} \left[\sigma(-B_i - \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{1 - Y_{ij}}, \\ p(\tau_0 | \cdots) &\propto \tau_0^{\frac{n}{2} + 1-1} \exp\left\{-\tau_0 \left[0.005 + \frac{1}{2}\sum\limits_{i=1}^n (B_i - \beta_0)^2\right]\right\}, \\ p(\mathbf{B} | \cdots) &\propto \prod\limits_{i=1}^n \left[ \exp \left\{- \frac{\tau_0}{2} (B_i - \beta_0)^2 \right\} \prod\limits_{j=1}^{n_i} \left[\sigma(B_i + \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{Y_{ij}} \left[\sigma(-B_i - \boldsymbol{\beta}_{-0}^\top \mathbf{z}_{ij})\right]^{1 - Y_{ij}} \right]. \end{align*}\]

For \(\beta_0\) we see that on-log scale we have a quadratic function of \(\beta_0\), which means that the full-conditional distribution for \(\beta_0\) is normal distribution. In particular, \[ \beta_0 \left|\, \mathbb{Y}, \mathbf{B}, \tau_0 \right. \sim \mathsf{N} \left( \dfrac{\frac{1}{10^2} 0 + \tau_0 n \overline{B}_n}{\tau^\star}, \left[\tau^\star\right]^{-1} \right), \qquad \text{where} \quad \tau^\star = \frac{1}{10^2} + \tau_0 n. \] Then, the full-conditional distribution of \(\tau_0\) yields gamma distribution \[ \tau_0 |\, \mathbf{B}, \beta_0 \sim \Gamma\left(1+\frac{n}{2}, 0.005 + \frac{1}{2}\sum\limits_{i=1}^n (B_i - \beta_0)^2\right). \] However, for the rest of the parameters we clearly cannot recognize any pdf of a well known distribution due to appearance both in \(\exp(\cdot)\) and \(\sigma(\cdot)\). Hence, approximative methods of sampling from these distributions have to be used. Luckily, JAGS is able to do that for us, see the manual.

The only thing we can say about the full-conditional distribution of \(\mathbf{B}\) is that it is divided into \(n\) independent blocks for each random intercept.

Task 2 - Implementation of Gibbs sampling with JAGS

First, we need to prepare the data for JAGS and be careful about indexing within the given data:

table(toenail[, "idnr"])                       # n_i - observations per ID
(nsubj <- length(unique(toenail[, "idnr"])))   # 294 unique IDs
# Not all numbers between 1 - 383 are used in the idnr column!

# Create new IDs with values 1 - 294  
toenail <- toenail[order(toenail[, "idnr"], toenail[, "visit"]), ] # order observations by id and by visit
tidnr <- table(toenail[, "idnr"])                                  # n_i ordered by idnr
toenail[, "id"] <- rep(1:nsubj, tidnr)                             # replicate new ID n_i times
tail(toenail)                                                      # last i: idnr=383, but id=294

(n <- nrow(toenail))                                               # total number of observations
# Definition of a list that will be given to JAGS
# Variables within JAGS will be called by these names
Data <- list(n = n,                       # sum(n_i)
             nsubj = nsubj,               # n
             id = toenail[, "id"],        # id
             y = toenail[, "infect"],     # binary response Y_{ij}
             trt = toenail[, "trt"],      # binary covariate x_i
             time = toenail[, "time"])    # time covariate t_{ij}

Next, we need to implement the hierarchical model for JAGS, which will be a very long string. This is the place to describe how the data are generated, including prior distributions. The distributions are defined in JAGS terminology and parametrization which may differ from the one in R!

Normal distribution has the same shortcut dnorm, but is parametrized through precision and not standard deviation.
Bernoulli trials with a probability of success p are parametrized as dbern(p).
Gamma distribution is given as dgamma().

Distributions are assigned via stochastic relation ~, while deterministic relation is given by <- (for auxiliary quantities). Overall the notation resembles R code:

vectors are created with c(),
arithmetic sequence increased by 1 is created by the range operator :, e.g. 1:n,
for cycle has the same syntax, including combining commands with {...}.

Model <- "
model{
  for (i in 1:n){
    eta[i] <- b[id[i]] + betaTrt * trt[i] + betaTime * time[i] + betaTrtTime * trt[i] * time[i]
    pi[i] <- exp(eta[i]) / (1 + exp(eta[i]))
    y[i] ~ dbern(pi[i])
  }

  for (i in 1:nsubj){
    b[i] ~ dnorm(beta0, tau0)
  }

  beta0 ~ dnorm(0, 0.01)
  betaTrt ~ dnorm(0, 0.01)
  betaTime ~ dnorm(0, 0.01)
  betaTrtTime ~ dnorm(0, 0.01)
  tau0 ~ dgamma(1, 0.005)

  d0 <- 1 / sqrt(tau0)
}
"

Within this definition we consider the following notation:

eta[1:n] - predictor values for each patient given current value of other parameters,
b[1:nsubj] - random intercepts for each patient,
betaTrt - coefficient for treatment effect (difference between B and A) at time 0,
betaTime - coefficient (slope) for time with treatment A,
betaTrtTime - interaction coefficient for treatment and time,
pi[1:n] - auxiliary vector for individual probabilities,
beta0 - prior mean for random intercepts,
tau0 - prior precision for random intercepts,
d0 - prior standard deviation (transformation of tau0),

other variables (y, n, nsubj, id, trt, time) are taken from the list Data.

Now we need to propose starting values for the model parameters. The values should be in list with names corresponding to parameter names. Suitable initial values could be found by a frequentistic approach with a simpler model.

# Logistic regression without random effects (assuming independence among all observations)
m0 <- glm(infect ~ trt*time, family = binomial(link = logit), data = toenail)
summary(m0)

# Each chain is started with different values (list of lists)
# .RNG.... is setting up the seed for each chain + type of the random number generator
# ... for reproducibility purposes
Init <- list(list(.RNG.seed = 20251110, .RNG.name = "lecuyer::RngStream",
                  beta0 = -0.5, betaTrt = 0,     betaTime = -0.2, betaTrtTime = 0,   tau0 = 1),
             list(.RNG.seed = 20251117, .RNG.name = "lecuyer::RngStream",
                  beta0 = 0,    betaTrt = -0.05, betaTime = -0.4, betaTrtTime = 0.1, tau0 = 2))

# Parameters to monitor (including deviance and related metrics)
parmDev <- c("deviance", "pd", "popt", "dic", "ped")
parameters.toenail <- c(parmDev, "beta0", "betaTrt", "betaTime", "betaTrtTime", "d0")

# Libraries that are essential to run JAGS via R
library(runjags)
library(coda)              # environment for monitoring MCMC samples


# Suppress administrative output (just try it)
# runjags.options(silent.runjags = TRUE, silent.jags = FALSE)


# Set runjags to use multiple cores in parallel
# library(parallel)
# my_cluster <- makeCluster(2)
# runjags.options(method = "rjparallel")
runjags.options(method = "rjags")           # sets it back to run everything on just one core

Finally, everything is ready. To sample MCMC we just need to run.jags(). For reproducibility reasons we first set the seed. Then, we call run.jags where we declare number of chains, the length of burnin etc. Sampling (took about 1 min on decent PC). MCMC length below is perhaps too short, sample should be at least 10-times higher, i.e., for real results, use at least sample = 30000 and perhaps also burnin should be slightly longer.

# Comment/uncomment with Ctrl+Shift+C
set.seed(20010911)
jagsLogit <- run.jags(model = Model,                  # model definition as a string
                      monitor = parameters.toenail,   # vector of monitored parameters
                      data = Data,                    # list of used data 
                      inits = Init,                   # list of initial values for each chain
                      adapt = 1000,                   # number of adaptive iterations (at the start)
                      n.chains = 2,                   # number of chains
                      thin = 1,                       # thinning (keep only every thin-th value)
                      # method = "parallel",          # parallel sampling of the chains
                      # cl = my_cluster,              # computation on more cores
                      burnin = 1500,                  # length of the burnin period
                      sample = 5000)                  # the length of the sample used for inference

# stopCluster(my_cluster)

# After time-consuming sampling, it is always good idea to save the sampled values for future use
save(list = "jagsLogit", file = file.path(DATA, "mcmc_toenail.RData"))

load(file.path(DATA, "mcmc_toenail.RData"))

Let’s explore the contents of jagsLogit:

# str(jagsLogit)                   # quite a long list of computed values
attributes(jagsLogit)
# The sampled states are within `$mcmc` - which is a list for each chain
class(jagsLogit$mcmc)              # mcmc.list - class from coda package
head(jagsLogit$mcmc[[1]])          # first 6 sampled states in first chain (after burnin)
jagsLogit$burnin
jagsLogit$sample
jagsLogit$thin
jagsLogit$model
jagsLogit$monitor
jagsLogit$timetaken
jagsLogit$end.state                # last sampled values for each parameter
# Could be used for continuation in sampling, if insufficient
jagsLogit2 <- run.jags(model = Model, monitor = parameters.toenail, data = Data,  
                       inits = jagsLogit$end.state,    # starting a new chain from the last value
                       adapt = 0,                      # already adapted
                       n.chains = length(jagsLogit$mcmc), # the same as before
                       thin = jagsLogit$thin,          # the same as before
                       burnin = 0,                     # no burning required
                       sample = 100)
head(jagsLogit2$mcmc[[1]])

Basic posterior summary

print(jagsLogit, digits = 3)         # runjags summaries
jagsLogit$summary$statistics         # coda posterior Mean, SD, Naive SE, corrected SE
jagsLogit$summary$quantiles          # coda posterior quantiles
jagsLogit$summaries                  # combination + MCMC diagnostic parameters

# print(jagsLogit$deviance.table)    # contribution to deviance by each observation (too long) 
print(jagsLogit$deviance.sum)        # devaince and related quantities

Task 3 - Trajectories and assessment of convergence

Brief check of convergence and properties of the chain is explored with the * traceplots, * posterior CDF’s based on different chains, * histograms of sampled values and * autocorrelation function.

Author of the runjags package provided us with plotting functions designed for the outputs.

# plot(jagsLogit, layout = c(2, 2)) # all plots together (too much)

# This way we can plot all types of plots for each monitored variable separately:
plot(jagsLogit, vars = "beta0")

plot(jagsLogit, vars = "betaTrt")

plot(jagsLogit, vars = "betaTime")

plot(jagsLogit, vars = "betaTrtTime")

plot(jagsLogit, vars = "d0")

plot(jagsLogit, vars = "deviance")

Only traceplots for selected parameters

Parm <- c("beta0", "betaTrt", "betaTime", "betaTrtTime", "d0", "deviance")
plot(jagsLogit, vars = Parm, plot.type = "trace", layout = c(2, 3))

Only ECDF for selected parameters

plot(jagsLogit, vars = Parm, plot.type = "ecdf", layout = c(2, 3))

Only histograms for selected parameters

plot(jagsLogit, vars = Parm, plot.type = "histogram", layout = c(2, 3))

Only autocorrelation function (ACF) for selected parameters

plot(jagsLogit, vars = Parm, plot.type = "autocorr", layout = c(2, 3))

Crosscorrelations between chains of different parameters

plot(jagsLogit, plot.type = "crosscorr")

Gelman-Rubin convergence diagnostic

gelman.diag(jagsLogit$mcmc)            # see help(coda::gelman.diag) for details

Geweke’s diagnostic (for beta’s only, separate set of plots for each chain)

geweke.diag(jagsLogit$mcmc)             # see help(coda::geweke.diag) for details

par(mfrow = c(2, 3))
geweke.plot(jagsLogit$mcmc[[1]])        # see help(coda::geweke.plot) for details

geweke.plot(jagsLogit$mcmc[[2]])

Task 4 - Exploring the posterior using MCMC samples

Let us now focus more on the estimates for regression coefficients. The basic summary is found here:

jagsLogit$summary

The posterior SD for beta coefficients could be extracted the following way:

beta_names <- rownames(jagsLogit$summary$statistics)[grep("^beta", rownames(jagsLogit$summary$statistics))]
(beta_post_sd <- jagsLogit$summary$statistics[beta_names, "SD"])
10 / beta_post_sd     # How many times the prior SD is larger than the posterior

We clearly see that the prior distribution has much larger variability and, thus, could be considered weakly informative (prior does not determine the results).

The credible intervals included within the summary are actually Equal-Tailed (ET) credible intervals. These are constructed simply by applying quantile with probabilities \(\frac{\alpha}{2}\) and \(1-\frac{\alpha}{2}\) to the relevant samples. Compare for yourself:

(betaET <- jagsLogit$summary$quantiles[beta_names, paste0(c(2.5, 97.5),"%")])
apply(jagsLogit$mcmc[[1]][, beta_names], 2, quantile, prob = c(0.025, 0.975)) # based only on chain 1
beta_ET_manual <- t(apply(rbind(jagsLogit$mcmc[[1]][, beta_names], 
                                jagsLogit$mcmc[[2]][, beta_names]),
                          2, quantile, prob = c(0.025, 0.975)))
all.equal(betaET, beta_ET_manual) # all values are equal up to a tolerance of cca 1e-8

Highest Posterior Density (HPD) credible intervals are already calculated and available in the output.

print(jagsLogit$hpd)
print(jagsLogit$HPD)           # the same
HPDinterval(jagsLogit$mcmc)    # computation for each chain separately

See help(coda::HPDinterval) for some details. For each parameter the interval is constructed from the ECDF of the sample as the shortest interval for which the difference in the ECDF values of the endpoints is the nominal probability. Assuming that the distribution is not severely multimodal, this is the HPD interval.

When the samples are cleared of high autocorrelation (higher thin), basically any approach for approximation of a characteristic of a distribution can be used to approximate the same characteristic of posterior distribution. For example, let us use kernel density estimators for the coefficients:

COL <- c("blue", "red")
par(mfrow = c(2,2), mar = c(4,4,2,0.5))
for(beta in beta_names){
  dens <- lapply(jagsLogit$mcmc, function(mcmc){density(mcmc[,beta])})
  rangex <- range(unlist(lapply(dens, function(d){range(d$x)})))
  rangey <- range(unlist(lapply(dens, function(d){range(d$y)})))
  
  plot(0,0,type = "n", xlim = rangex, ylim = rangey,
       xlab = beta, ylab = "Density", main = beta)
  for(i in 1:length(dens)){
    lines(dens[[i]], col = COL[i], lty = 2)
    abline(v = jagsLogit$summary$quantiles[beta,c("2.5%", "97.5%")], col = "darkgreen", lty = 3)
    abline(v = jagsLogit$HPD[beta, c("Lower95", "Upper95")], col = "goldenrod", lty = 3)
  }
  legend("topleft", legend = 1:length(dens), title = "Chain", col = COL, lty = 2, bty = "n")
  legend("topright", c("ET", "HPD"), title = "95% CI", col = c("darkgreen", "goldenrod"), lty = 3, bty = "n")
}

Or even a bivariate version including the contours:

library(MASS)
# ?MASS::kde2d
dens <- lapply(jagsLogit$mcmc, function(mcmc){kde2d(x = mcmc[,"betaTime"], 
                                                    y = mcmc[,"betaTrtTime"], 
                                                    n = 40)})
# rangex <- range(unlist(lapply(dens, function(d){range(d$x)})))
# rangey <- range(unlist(lapply(dens, function(d){range(d$y)})))
rangez <- range(unlist(lapply(dens, function(d){range(d$z)})))

par(mfrow = c(1,2), mar = c(4,4,2,0.5))
for(ch in 1:length(dens)){
  image(dens[[ch]], 
        zlim = rangez, # xlim = rangex, ylim = rangey, 
        main = paste("Chain", ch), xlab = "betaTime", ylab = "betaTrtTime")
  contour(dens[[ch]], add = T)
}

Task 5+6 - Additional parameters of interest

When we read carefully the assignment, we decode that * \(d_0 = \tau_0^{-1/2}\) = d0,. * \(\gamma_1 = \beta_2\) = betaTime, * \(\gamma_2 = \beta_2+\beta_3\) = betaTime + betaTrtTime, * \(\gamma_3(t) = \beta_1+ t \beta_3\) = betaTime + t * betaTrtTime, (B compared to A), which depends on the time t.

First two parameters are already included within out sampled states. The posterior summary is already at our disposal within jagsLogit object.

summary(jagsLogit, vars = c("d0", "betaTime"))

The posterior density estimate + credible intervals (based on both chains):

COL <- c("blue", "red")
par(mfrow = c(1,2), mar = c(4,4,2,0.5))
for(var in c("d0", "betaTime")){
  dens <- lapply(jagsLogit$mcmc, function(mcmc){density(mcmc[,var])})
  rangex <- range(unlist(lapply(dens, function(d){range(d$x)})))
  rangey <- range(unlist(lapply(dens, function(d){range(d$y)})))
  
  plot(0,0,type = "n", xlim = rangex, ylim = rangey,
       xlab = var, ylab = "Density", main = var)
  for(i in 1:length(dens)){
    lines(dens[[i]], col = COL[i], lty = 2)
    abline(v = jagsLogit$summary$quantiles[var,c("2.5%", "97.5%")], col = "darkgreen", lty = 3)
    abline(v = jagsLogit$HPD[var,c("Lower95", "Upper95")], col = "goldenrod", lty = 3)
  }
  legend("topleft", legend = 1:length(dens), title = "Chain", col = COL, lty = 2, bty = "n")
  legend("topright", c("ET", "HPD"), title = "95% CI", col = c("darkgreen", "goldenrod"), lty = 3, bty = "n")
}

The other parameters are not included among the monitored parameters (unless time t=0). One way to obtain all necessary estimates would be to include them in the Model definition, the same way as d0 was. But we would have to run the sampling algorithm once again, which is time and energy consuming.

However, we already have the samples at our disposal. The quantities of interest are actually a simple transformation of the other available parameters. Their samples are then corresponding deterministic transformation of the available samples.

t <- 5                 # chosen value for time
my_mcmc <- list()
for(ch in 1:2){
  my_mcmc[[ch]] <- data.frame(jagsLogit$mcmc[[ch]][,"betaTime"] + jagsLogit$mcmc[[ch]][,"betaTrtTime"],
                              jagsLogit$mcmc[[ch]][,"betaTrt"] + t * jagsLogit$mcmc[[ch]][,"betaTrtTime"])
  colnames(my_mcmc[[ch]]) <- c("gamma2", "gamma3t")
  start_end <- attr(jagsLogit$mcmc[[ch]], "mcpar")
  my_mcmc[[ch]] <- mcmc(my_mcmc[[ch]],         # creating my own class(mcmc)
                        start = start_end[1],  # jagsLogit$burnin + 1
                        end = start_end[2],    # jagsLogit$burnin + jagsLogit$sample
                        thin = jagsLogit$thin) # start_end[3]
}
class(my_mcmc)
my_mcmc <- as.mcmc.list(my_mcmc)      # now of class mcmc.list
(sum_my_mcmc <- summary(my_mcmc))     # uses corresponding summary method for class mcmc.list

With the new mcmc.list object we can use all methods as directly for any mcmc object.

par(mfrow = c(1,2), mar = c(4,4,2,0.5))
plot(my_mcmc, trace = TRUE, density = FALSE)

It remains to compute HPD credible intervals (ET already within the summary).

# chains separately
HPDinterval(my_mcmc)
# both chains together
(gammaHPD <- HPDinterval(as.mcmc(rbind(my_mcmc[[1]], my_mcmc[[2]]))))

COL <- c("blue", "red")
par(mfrow = c(1,2), mar = c(4,4,2,0.5))
for(var in c("gamma2", "gamma3t")){
  dens <- lapply(my_mcmc, function(mcmc){density(mcmc[,var])})
  rangex <- range(unlist(lapply(dens, function(d){range(d$x)})))
  rangey <- range(unlist(lapply(dens, function(d){range(d$y)})))
  
  plot(0,0,type = "n", xlim = rangex, ylim = rangey,
       xlab = var, ylab = "Density", main = var)
  for(i in 1:length(dens)){
    lines(dens[[i]], col = COL[i], lty = 2)
    abline(v = sum_my_mcmc$quantiles[var,c("2.5%", "97.5%")], col = "darkgreen", lty = 3)
    abline(v = gammaHPD[var,c("lower", "upper")], col = "goldenrod", lty = 3)
  }
  legend("topleft", legend = 1:length(dens), title = "Chain", col = COL, lty = 2, bty = "n")
  legend("topright", c("ET", "HPD"), title = "95% CI", col = c("darkgreen", "goldenrod"), lty = 3, bty = "n")
}

Once we have samples of model parameters, exploring the posterior of some parametric function is only a matter of transformation. Other potential parametric functions of interest:

exponential transformations of the coefficients,
odds ratios for comparing two patients of the same predisposition for infection (measured by random intercept),
probability of infection with certain drug and at certain time under optimistic, realistic or pesimistic scenario,
possibly even for a grid of time values, not just one.

Task 7 - Probability resembling p-value

The goal is to find the lowest coverage for ET intervals such that 0 is not contained within the ET interval. We will showcase this on \(\gamma_3(t)\).

Lets start slowly by finding the posterior median:

(gamma3t_med <- sum_my_mcmc$quantiles["gamma3t", "50%"])
(which_side <- ifelse(gamma3t_med < 0, "upper", "lower"))

Now we know which end of the interval (lower or upper) will touch 0. Let us formulate the function that measures how far is the upper bound from zero.

# First, for simplicity, take all samples of the parameter into one vector
gamma3t <- c(my_mcmc[[1]][,"gamma3t"], my_mcmc[[2]][,"gamma3t"])

dist_ET_from_0 <- function(alpha){
  quantile(gamma3t, prob = 1-alpha/2) - 0 # not a distance (also negative values)
}

The Bayesian version of p-value is now root of the defined function:

gamma3t_pvalue <- uniroot(dist_ET_from_0, interval = c(0, 1))
gamma3t_pvalue$root
# check, whether it truly satisfies the property
quantile(gamma3t, probs = c(gamma3t_pvalue$root/2, 1-gamma3t_pvalue$root/2))

Let me summarize the process into a function:

bayes_pvalue <- function(x, x0 = 0){
  medx <- median(x)
  rangex <- range(x)
  if((rangex[1]-x0) * (rangex[2]-x0) > 0){    # same signs (all sampled values on the same side from x0)
    return(0)                                 # we are pretty sure, parameter x is far away from x0
  }else{                                      # opposite signs (x0 between sampled values)
    if(medx < x0){
      dist_ET_from_x0 <- function(alpha){
        quantile(x, prob = 1-alpha/2) - x0
      }
    }else{
      dist_ET_from_x0 <- function(alpha){
        quantile(x, prob = alpha/2) - x0
      }
    }
    root <- uniroot(dist_ET_from_x0, interval = c(0, 1))
    return(root$root)
  }
}

Lets check whether it works (one both chains):

# betaTrt
jagsLogit$summary$quantiles["betaTrt",]
bayes_pvalue(c(jagsLogit$mcmc[[1]][,"betaTrt"], jagsLogit$mcmc[[2]][,"betaTrt"]))

# betaTime
jagsLogit$summary$quantiles["betaTime",]
bayes_pvalue(c(jagsLogit$mcmc[[1]][,"betaTime"], jagsLogit$mcmc[[2]][,"betaTime"]))
bayes_pvalue(c(jagsLogit$mcmc[[1]][,"betaTime"], jagsLogit$mcmc[[2]][,"betaTime"]), -0.3)

# betaTrtTime
jagsLogit$summary$quantiles["betaTrtTime",]
bayes_pvalue(c(jagsLogit$mcmc[[1]][,"betaTrtTime"], jagsLogit$mcmc[[2]][,"betaTrtTime"]))