The goal of this exercise is to introduce alternative MCMC samplers to JAGS. We will demonstrate their use on the following example.
Consider a time-series \[ Y_t = a_0 + a_1 Y_{t-1} +\varepsilon _t, \quad t = 1, \ldots, T, \] where \(\varepsilon_t | Y_1, \ldots, Y_{t-1} \sim \mathsf{N}(0, \sigma_t^2)\) and \[ \sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2. \] Parameters \(a_0, a_1 \in \mathbb{R}\) are unrestricted, however, \(\alpha_0, \alpha_1, \beta_1 \geq 0\) (so that the variance \(\sigma_t^2 \geq 0\)) and \(\alpha_1 + \beta_1 < 1\) (stationarity condition). The model is called \(\mathsf{AR}(1)\)-\(\mathsf{GARCH}(1,1)\), since the mean follows Auto-Regressive sequence of order 1 and the variance follows Generalized Auto-Regressive Conditional Heteroscedasticity of orders 1 and 1.
The model is used for modelling log-returns of assets in financial markets when the volatility changes over time and exhibits clustering property (large price changes tend to be followed by large changes, and small changes by small changes). We will work with German DAX index data available in R:
data(EuStockMarkets)
dax <- diff(log(EuStockMarkets[,1]))
LokSD <- function(x, step = 20){
RET <- numeric(length(x)-2*step)
for(i in 1:(length(x)-2*step)){
RET[i] <- sd(x[i:(i+2*step)])
}
return(RET)}
step <- 15
loksds <- LokSD(dax, step)
par(mfrow = c(1,1), mar = c(4,4.2,0.5,0.5))
plot(dax, col = "slategrey",
xlim = c(1991.6, 2001), xaxt = "n",
ylim = c(-0.06, 0.06), ylab = expression(log(DAX[t]/DAX[t-1])))
axis(1, at = c(1992:1999), labels = c(1992:1999), las = 1)
lines(loksds ~ time(dax)[(step+1):(length(dax)-step)], col = "red")
lines(-loksds ~ time(dax)[(step+1):(length(dax)-step)], col = "red")
abline(h = 0, col = "darkviolet", lty = 2)
abline(v = 1999, col = "darkviolet", lty = 2)
dens <- density(dax)
koef <- 0.04
lines(x = 1999 + dens$y*koef, y = dens$x, col = "skyblue", lwd = 2)
lines(x = 1999 + koef*dnorm(dens$x, mean(dax), sd(dax)), y = dens$x, col = "orange", lty = 2)
legend("topright", c(expression(N(hat(mu),hat(sigma) ^2)),"Density"),
col = c("orange","skyblue"), lty = c(2,1), lwd = c(1,2), bty = "n")
legend("bottomright", c("DAX", "Local SD"),
col = c("slategrey","red"), lty = 1, lwd = 1, bty = "n")

Let us denote \(\boldsymbol{\theta} = (a_0, a_1, \alpha_0, \alpha_1, \beta_1)^\top\) the vector of unknown parameters. The likelihood for this model is by chain rule \[ p(\mathbf{Y} \,|\, \boldsymbol{\theta}) = p(Y_1 | Y_0; \boldsymbol{\theta}) \, p(Y_2 | Y_1; \boldsymbol{\theta}) \, \cdots p(Y_T | Y_{T-1}; \boldsymbol{\theta}) = \prod\limits_{t=1}^T (2\pi \sigma_t^2)^{-\frac{1}{2}} \exp \left\{- \frac{\varepsilon_t^2}{2\sigma_t^2}\right\} \] given \(Y_0 = 0\), \(\varepsilon_0 = 0\) and \(\sigma_0^2 = 1\). Log-likelihood is also very simple to evaluate \[ \ell(\mathbf{Y} | \boldsymbol{\theta}) = - \frac{T}{2}\log(2\pi) - \frac{1}{2} \sum\limits_{t=1}^T \log(\sigma_t^2) - \frac{1}{2} \sum\limits_{t=1}^T \frac{\varepsilon_t^2}{\sigma_t^2} \] when \(\varepsilon_t = Y_t - a_0 - a_1 Y_{t-1}\) and \(\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2\) are used.
From experience in modelling financial markets we know that it is reasonable to assume the following independent prior distributions: \[ a_0 \sim \mathsf{N}(0,3), \quad a_1 \sim \mathsf{N}(0,3), \quad \alpha_0' = \log \alpha_0 \sim \mathsf{N}(-12.3,5), \quad \alpha_1' = \log \alpha_1 \sim \mathsf{N}(-2,5), \quad \beta_1' = \log \beta_1 \sim \mathsf{N}(-0.2,5), \] which yields unrestricted parametrization through \(\boldsymbol{\theta}' = (a_0, a_1, \alpha_0', \alpha_1', \beta_1')^\top \in \mathbb{R}^5\). To fulfill the stationarity condition \(\alpha_1 + \beta_1 = \exp\{\alpha_1'\} + \exp\{\beta_1'\} < 1\) we include indicator function to the pdf of prior distribution: \[ p(\boldsymbol{\theta}') \propto p(a_0) \, p(a_1) \, p(\alpha_0') \, p(\alpha_1') \, p(\beta_1') \, \mathbb{I}_{\mathcal{S}}(\alpha_1', \beta_1'), \] where \(\mathcal{S} = \left\{ (s_1, s_2)^\top\in \mathbb{R}^2: e^{s_1} + e^{s_2} < 1 \right\}.\)
Unfortunately, the full-conditional distributions of model parameters are difficult to work with. Hence, Gibbs sampling is very difficult to perform.
We will then try to use Metropolis-Hastings algorithm to
If the proposed value \(\boldsymbol{\psi}\) does not fulfill the stationarity condition, it cannot be accepted as \(\alpha(\boldsymbol{\theta}_m', \boldsymbol{\psi}) = 0\) and \(\boldsymbol{\theta}_{m+1}' := \boldsymbol{\theta}_m'\). If the proposed value \(\boldsymbol{\psi}\) yields higher posterior log-density, then it is surely accepted as \(\alpha(\boldsymbol{\theta}_m', \boldsymbol{\psi}) = 1\) and \(\boldsymbol{\theta}_{m+1}' := \boldsymbol{\psi}\).
Metropolis-Hastings algorithm has to be written down manually. We start with the function for posterior log-density:
logposterior <- function(y, epsilon0 = 0, sigma20 = 1, y0 = 0,
state = c(a0 = 0, a1 = 0, lalpha0 = -12.3, lalpha1 = -2, lbeta1 = -0.2)){
n <- length(y)
alpha0 <- exp(state["lalpha0"])
alpha1 <- exp(state["lalpha1"])
beta1 <- exp(state["lbeta1"])
if(alpha1 + beta1 < 1){
epsilon <- y - state["a0"] - state["a1"] * c(y0, y[1:(n-1)])
sigma2 <- numeric(n)
sigma2[1] <- alpha0 + alpha1*epsilon0^2 + beta1*sigma20^2
for(t in 1:(n-1)){
sigma2[t+1] <- alpha0 + alpha1*epsilon[t]^2 + beta1*sigma2[t]
}
loglik <- sum(-0.5*log(2*pi*sigma2) - 0.5*(epsilon)^2/sigma2)
loglik <- loglik - state["a0"]^2/6 - state["a1"]^2/6 -
(state["lalpha0"]+12.3)^2/10 - (state["lalpha1"]+2)^2/10 -(state["lbeta1"]+0.2)^2/10
}else{
loglik <- -Inf
}
return(loglik)
}
Let the covariance matrix \(\Sigma\) defining the distribution of the random walk step be given. Then, the algorithm could be written in this way:
library("mvtnorm") # for rmvnorm()
ARGARCH_MHalg <- function(y, M, Sigma, start, nprop = 50){
if(missing(start)){
start <- c(a0 = 0, a1 = 0, lalpha0 = -12.3, lalpha1 = -2, lbeta1 = -0.2)
}
chain <- proposal <- data.frame(a0 = numeric(M), a1 = numeric(M), lalpha0 = numeric(M),
lalpha1 = numeric(M), lbeta1 = numeric(M))
prob <- accepted <- logproposal <- numeric(M)
laststate <- start
loglast <- logposterior(y, state = laststate)
for(m in 1:M){
step <- rmvnorm(1, sigma = Sigma)
proposal[m,] <- step + laststate
logproposal[m] <- logposterior(y, state = unlist(proposal[m,]))
prob[m] <- min(1, exp(logproposal[m] - loglast))
accepted[m] <- rbinom(1, 1, prob[m])
if(accepted[m]){
laststate <- chain[m,] <- proposal[m,]
loglast <- logproposal[m]
}else{
chain[m,] <- laststate
}
}
# Where the next proposal could take us?
next_prop <- as.data.frame(rmvnorm(nprop, sigma = Sigma) +
matrix(rep(unlist(laststate), nprop), nrow = nprop, byrow = T))
colnames(next_prop) <- names(start)
next_logp <- apply(next_prop, 1, function(theta){logposterior(y, state = unlist(theta))})
next_prob <- pmin(1, exp(next_logp - loglast))
return(list(chain = chain,
start = start,
proposal = proposal,
logproposal = logproposal,
prob = prob,
accepted = accepted,
laststate = laststate,
loglast = loglast,
next_prop = next_prop,
next_prob = next_prob))
}
For example, let \(\Sigma\) be the same as in the prior distribution:
Sigma <- diag(c(3,3,5,5,5))
M <- 1000
start <- c(a0 = 0, a1 = 0, lalpha0 = -12.3, lalpha1 = -2, lbeta1 = -0.2)
set.seed(123456789)
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
Let us examine the trajectory of the sampled MCMC:
traceplots <- function(mcmc, B = 0, M = nrow(mcmc$chain), thin = 1, COL = "slategrey"){
LABS <- c("a0", "a1", "log(alpha0)", "log(alpha1)", "log(beta1)")
names(LABS) <- c("a0", "a1", "lalpha0", "lalpha1", "lbeta1")
inds <- seq(B+1, M, by = thin)
par(mfrow = c(2,3), mar = c(4,4,0.8,0.8))
AllData <- mcmc$chain[inds,]
Maxs <- as.data.frame(lapply(AllData, max))
Mins <- as.data.frame(lapply(AllData, min))
for(name in c("a0", "a1", "lalpha0", "lalpha1", "lbeta1")){
plot(mcmc$chain[inds,name] ~ inds,
ylab = LABS[name], xlab = "Index", type= "l")
}
# probability acceptance plot
plot(cumsum(mcmc$accepted[inds])/1:length(inds) ~ inds, type = "l",
xlab = "Index", ylab = "Probability", ylim = c(0,1), col = COL, lwd = 1)
legend("topright", paste0(round(mean(mcmc$accepted), digits = 4)*100, " %"),
col = COL, lty = 1, bty = "n", title = "Acceptance rate")
}
traceplots(mcmc)
None of the proposed states has been accepted! Even
though covariance matrix \(\Sigma\) can
be theoretically chosen arbitrarily for Metropolis-Hastings to work, in
practice the choice is crucial for obtaining reasonable MCMC
approximation. Since the log-posterior for initial state is 5927 and for
proposed values we have such summary:
sum(mcmc$logproposal == -Inf)
## [1] 645
summary(mcmc$logproposal[is.finite(mcmc$logproposal)])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1116794 -31502 -8917 -47254 -4506 3244
the difference between them is 2683, thus, the highest
observed acceptance probability is
exp(-2683), which is nearly zero.
In order to increase the acceptance rate we need much more
appropriate matrix \(\Sigma\). The
difference in posterior log-density should be of unit size so that
exp of the difference yields acceptance probability under
which we can actually succeed. This is achieved when the proposal
distribution resembles the posterior. The problem is that we do not
know, what the posterior is, that is why we use MCMC in the first
place.
The problem with our previous choice of \(\Sigma\) was that it proposed too large steps (prior is flat compared to posterior). We need to reduce the scales appropriately.
diagS <- diag(c(0.005, 0.1, 0.5, 0.5, 0.05))
corrS <- diag(5)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
traceplots(mcmc)

image_partial_posterior <- function(data,
mcmc,
vars = c("lalpha1", "lbeta1"),
XLIM = c(-4, -0.02),
YLIM = c(-0.7, -0.02),
fixvalues = c(a0=0.000795, a1=0.00730, lalpha0=-11.98,
lalpha1=-2.05, lbeta1=-0.211, alpha0=exp(-11.98),
alpha1=exp(-2.05), beta1 = exp(-0.211)),
grid.size = 30,
Ntercol = 100,
add_trace = F,
add_where_next = F){
grid1 <- seq(from = XLIM[1], to = XLIM[2], length.out = grid.size)
grid2 <- seq(from = YLIM[1], to = YLIM[2], length.out = grid.size)
griddif1 <- (max(grid1)-min(grid1))/(grid.size-1)
griddif2 <- (max(grid2)-min(grid2))/(grid.size-1)
Z <- sapply(grid2, function(y){
sapply(grid1, function(x){
auxstate <- fixvalues
auxstate[vars] <- c(x, y)
logposterior(data, state = auxstate)
})
})
Z[is.infinite(Z)] <- min(Z[is.finite(Z)]) - 1
image(grid1, grid2, Z, col = terrain.colors(Ntercol),
xlab = vars[1], ylab = vars[2])
contour(grid1, grid2, z = Z, add = T, drawlabels = T)
# stationarity condition for lalpha1 a lbeta1
if(is.element("lalpha1", vars) & is.element("lbeta1", vars)){
lines(x = grid1, y = log(1-exp(grid1)), col = "darkviolet", lty = 2, lwd = 2)
}
# other cases when just one of these is asked to plot
if(vars[1] == "lalpha1" & vars[2] != "lbeta1"){
abline(v = log(1-exp(fixvalues["lbeta1"])), col = "darkviolet", lty = 2, lwd = 2)
}
if(vars[2] == "lalpha1" & vars[1] != "lbeta1"){
abline(h = log(1-exp(fixvalues["lbeta1"])), col = "darkviolet", lty = 2, lwd = 2)
}
if(vars[1] != "lalpha1" & vars[2] == "lbeta1"){
abline(h = log(1-exp(fixvalues["lalpha1"])), col = "darkviolet", lty = 2, lwd = 2)
}
if(vars[2] != "lalpha1" & vars[1] == "lbeta1"){
abline(v = log(1-exp(fixvalues["lalpha1"])), col = "darkviolet", lty = 2, lwd = 2)
}
if(add_trace){
states <- rbind(mcmc$start[vars],
mcmc$chain[as.logical(mcmc$accepted), vars])
numstays <- diff(c(0, which(as.logical(mcmc$accepted))))
fnumstays <- cut(numstays, breaks = c(0, 1, 3, 5, 10, 20, Inf),
labels = c("1", "2-3", "4-5", "6-10", "11-20", ">20"))
CEX <- seq(0.4, 2, length.out = nlevels(fnumstays))
names(CEX) <- levels(fnumstays)
lines(states, type = "b", col = "black", lwd = 1, lty = 1,
cex = CEX[fnumstays], pch = 21, bg = "slategrey")
}
if(add_where_next){
arrows(x0 = as.numeric(mcmc$laststate[vars[1]]),
y0 = as.numeric(mcmc$laststate[vars[2]]),
x1 = mcmc$next_prop[,vars[1]],
y1 = mcmc$next_prop[,vars[2]],
col = "orchid", lty = 2)
}
}
par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.01,0.015), YLIM = c(-1,1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-13,-11), YLIM = c(-3,-1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-3,-1), YLIM = c(-0.5,-0.05),
add_trace = T, add_where_next = T)

Finally, couple of accepted states. But this is still not enough.
diagS <- diag(c(0.001, 0.1, 0.1, 0.1, 0.01))
corrS <- diag(5)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.001,0.0028), YLIM = c(-0.35,0.35),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T, add_where_next = T)
Better, but still not great.
diagS <- diag(c(0.0005, 0.05, 0.15, 0.08, 0.012))
corrS <- diag(5)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0005,0.0018), YLIM = c(-0.2,0.22),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T, add_where_next = T)
What would happen if we reduced standard deviations too much?
diagS <- diag(c(0.000001, 0.001, 0.01, 0.005, 0.001))
corrS <- diag(5)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start=c(
a0=0.000795, a1=0.00730, lalpha0=-11.98, lalpha1=-2.05, lbeta1=-0.211))
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0001,0.0015), YLIM = c(-0.1,0.1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T, add_where_next = T)
Acceptance is high, however, we would stay long in certain parts of
posterior and it would take long time to move somewhere else. The short
steps also result in high autocorrelation, which complicates the
inference even further.
Finally, we arrive at reasonable choice of standard deviations:
diagS <- diag(c(0.0003, 0.012, 0.13, 0.08, 0.012))
corrS <- diag(5)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0001,0.0015), YLIM = c(-0.1,0.1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T, add_where_next = T)
We also notice that the posteriors for \(\alpha_0'\), \(\alpha_1'\) and \(\beta_1'\) seem to be correlated:
cor(mcmc$chain)
## a0 a1 lalpha0 lalpha1 lbeta1
## a0 1.00000000 -0.15105622 -0.03846381 0.06948551 -0.09375933
## a1 -0.15105622 1.00000000 -0.03703793 -0.12431537 0.18815676
## lalpha0 -0.03846381 -0.03703793 1.00000000 -0.47330453 -0.33768792
## lalpha1 0.06948551 -0.12431537 -0.47330453 1.00000000 -0.49061351
## lbeta1 -0.09375933 0.18815676 -0.33768792 -0.49061351 1.00000000
This is why we add also some correlations into the proposal distribution
diagS <- diag(c(0.0003, 0.012, 0.13, 0.08, 0.012))
corrS <- matrix(c(1, 0.0, 0.0, 0.0, 0.0,
0.0, 1, 0.0, 0.0, 0.0,
0.0, 0.0, 1, -0.4, -0.5,
0.0, 0.0, -0.4, 1, -0.3,
0.0, 0.0, -0.5, -0.3, 1), nrow = 5, byrow = T)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start)
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0001,0.0015), YLIM = c(-0.1,0.1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T, add_where_next = T)

It seems that it improves the acceptance probability and the exploration of the posterior. But there is also some downside to having correlations within the proposal density. Imagine we would start from a point far away from where the posterior is concentrated.
diagS <- diag(c(0.0003, 0.012, 0.13, 0.08, 0.012))
corrS <- matrix(c(1, 0.0, 0.0, 0.0, 0.0,
0.0, 1, 0.0, 0.0, 0.0,
0.0, 0.0, 1, -0.7, -0.7,
0.0, 0.0, -0.7, 1, -0.7,
0.0, 0.0, -0.7, -0.7, 1), nrow = 5, byrow = T)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M, Sigma, start=c(
a0=-0.001, a1=-0.01, lalpha0=-13, lalpha1=-3, lbeta1=-0.3))
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0012,0.002), YLIM = c(-0.1,0.1),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-16,-11.5), YLIM = c(-3.5,-1.7),
add_trace = T, add_where_next = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-3.5,-1.7), YLIM = c(-0.35,-0.05),
add_trace = T, add_where_next = T)

The correlations make it more difficult to move in the direction of steepest ascent. The chain is then more likely to zig-zag, which prolongs the burn-in phase.
Idea: Find reasonable starting value first and use correlations only then.
Finally we arrive to the following setting:
diagS <- diag(c(0.0003, 0.012, 0.13, 0.08, 0.012))
corrS <- matrix(c(1, 0.0, 0.0, 0.0, 0.0,
0.0, 1, 0.0, 0.0, 0.0,
0.0, 0.0, 1, -0.5, -0.45,
0.0, 0.0, -0.5, 1, -0.3,
0.0, 0.0, -0.45, -0.3, 1), nrow = 5, byrow = T)
Sigma <- diagS %*% corrS %*% diagS
mcmc <- ARGARCH_MHalg(dax, M = 10000, Sigma, start)
traceplots(mcmc)

par(mfrow = c(1,3), mar = c(4,4,1,0.5))
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("a0", "a1"),
XLIM = c(-0.0001,0.0015), YLIM = c(-0.1,0.1),
add_trace = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha0", "lalpha1"),
XLIM = c(-12.5,-11.5), YLIM = c(-2.4,-1.7),
add_trace = T)
image_partial_posterior(data = dax, mcmc = mcmc, vars = c("lalpha1", "lbeta1"),
XLIM = c(-2.4,-1.7), YLIM = c(-0.245,-0.175),
add_trace = T)

summary(mcmc$chain)
## a0 a1 lalpha0 lalpha1 lbeta1
## Min. :-5.327e-05 Min. :-0.087638 Min. :-12.46 Min. :-2.397 Min. :-0.2449
## 1st Qu.: 6.552e-04 1st Qu.:-0.011471 1st Qu.:-12.07 1st Qu.:-2.112 1st Qu.:-0.2177
## Median : 7.986e-04 Median : 0.006885 Median :-11.98 Median :-2.050 Median :-0.2111
## Mean : 7.960e-04 Mean : 0.007507 Mean :-11.98 Mean :-2.054 Mean :-0.2110
## 3rd Qu.: 9.359e-04 3rd Qu.: 0.027635 3rd Qu.:-11.88 3rd Qu.:-1.997 3rd Qu.:-0.2044
## Max. : 1.536e-03 Max. : 0.104214 Max. :-11.53 Max. :-1.773 Max. :-0.1781
apply(mcmc$chain, 2, sd)
## a0 a1 lalpha0 lalpha1 lbeta1
## 0.0002095183 0.0280057539 0.1365690414 0.0912294258 0.0096818757
par(mfrow = c(1,1), mar = c(4,4,4,4))
psych::pairs.panels(mcmc$chain)

par(mfrow = c(2,3), mar = c(4,4,4,0.5))
for(i in 1:5){
acf(mcmc$chain[,i], main = colnames(mcmc$chain)[i])
}

It would be great if the proposal distribution could be tuned automatically.
One idea comes from Laplace approximation. Locally, posterior log-density \(\ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})\) can be approximated by a quadratic form (Taylor expansion): \[ \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y}) = \ell(\widehat{\boldsymbol{\theta}}' \,|\, \mathbf{Y}) + \left. \dfrac{\partial \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})}{\partial \boldsymbol{\theta}'} \right|_{\boldsymbol{\theta}' = \widehat{\boldsymbol{\theta}}'} \left(\boldsymbol{\theta}' - \widehat{\boldsymbol{\theta}}'\right) + \frac{1}{2} \left(\boldsymbol{\theta}' - \widehat{\boldsymbol{\theta}}'\right)^\top \left.\dfrac{\partial^2 \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})}{\partial \boldsymbol{\theta}' \partial \boldsymbol{\theta}'^\top} \right|_{\boldsymbol{\theta}' = \widehat{\boldsymbol{\theta}}'} \left(\boldsymbol{\theta}' - \widehat{\boldsymbol{\theta}}'\right) + \cdots \]
If \(\widehat{\boldsymbol{\theta}}'\) maximizes \(\ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})\), then as a stationary point the gradient has to be zero: \(\left.\dfrac{\partial \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})}{\partial \boldsymbol{\theta}'} \right|_{\boldsymbol{\theta}' = \widehat{\boldsymbol{\theta}}'} = \mathbf{0}\) and the approximation can be written as \[ \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y}) \approx \mathrm{const.} - \frac{1}{2} \left(\boldsymbol{\theta}' - \widehat{\boldsymbol{\theta}}'\right)^\top \left[-\left.\dfrac{\partial^2 \ell(\boldsymbol{\theta}' \,|\, \mathbf{Y})}{\partial \boldsymbol{\theta}' \partial \boldsymbol{\theta}'^\top} \right|_{\boldsymbol{\theta}' = \widehat{\boldsymbol{\theta}}'}\right] \left(\boldsymbol{\theta}' - \widehat{\boldsymbol{\theta}}'\right), \] which resembles the log-pdf of multivariate normal distribution. Hence, the negative Hess matrix evaluated at \(\widehat{\boldsymbol{\theta}}'\) is reasonable choice for \(\Sigma^{-1}\) if proposals are taken close to the posterior mode \(\widehat{\boldsymbol{\theta}}'\).
Hamiltonian Monte-Carlo used by RStan has pushed this idea even further.