To model hypoglycaemic episodes in people with Type 1 diabetes in clinical trials and observational studies, the standard approach is to use negative binomial regression. The problem with this is that negative binomial regression assumes that person-time intervals as exchangeable: the individual identifiers attached to person-time intervals are ignored.
To exploit the information provided by repeat observations on the same individuals, we need a a mixed model. Because the data are counts with many zeroes, we need a Poisson likelihood.
Linear mixed models, with a gaussian likelihood, are easy to fit because the random effects can be integrated out. For binary data (Bernoulli likelihood) or count data (Poisson likelihood), there is no exact solution to the integral.
One approach to this is to approximate the integral and maximize the marginal likelihood of the model parameters: a Laplace approximation is implemented in the glmer()
function in the lmer4
package. Alternatively, a fully Bayesian approach can be used to sample the posterior density of the parameters. This does not rely on asymptotic approximations, and can be extended to more complex problems.
This tutorial demonstrates both approaches using a simulated dataset with counts generated from a Poisson distribution in a mixed model with a single fixed-effect covariate.
The first step is to simulate some data for Poisson counts with individual baseline hazard rates drawn from a log-normal distribution
library(ddpcr)
library(pander)
library(stanhl) # # edit defaults.R before building, with print(list(get_all_styles()))
stanhl_opts$set(style="tango")
quiet(stanhl_html()) # set up stan highlighting for HTML
stanhl_opts$set(style="tango")
## rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
N.indivs <- 200 # num individuals
beta <- -0.5 # log rate ratio parameter
mean.numintervals <- 10
mean.logbaseline <- 0
sd.logbaseline <- 2
newrun <- TRUE
#newrun <- FALSE
withvb <- FALSE
if(newrun) {
## generate random number of person-time intervals for each individual
T <- 1 + rpois(N.indivs, mean.numintervals)
## simulate baseline hazard rates from a log-normal distribution
logbaseline <- rnorm(N.indivs, mean.logbaseline, sd.logbaseline)
## simulate fixed covariate
x1 <- rnorm(N.indivs, 0, 1) # fixed covariate
x2 <- rnorm(N.indivs, 0, 1) # fixed covariate
## simulate from model with no changepoint
lambda <- exp(logbaseline + beta * x1)
mixdata <- NULL
for(i in 1:N.indivs) {
mixdata <- rbind(mixdata,
data.frame(indiv=as.factor(rep(i, T[i])),
time=1:T[i],
x1=rep(x1[i], T[i]),
x2=rep(x2[i], T[i]),
y=rpois(T[i], lambda[i])
))
}
head(mixdata)
y <- mixdata$y
N <- length(y)
N_indivs <- length(unique(mixdata$indiv))
X <- model.matrix(object = y ~ x1 + x2, data = mixdata)[, -1]
X <- scale(X, center=TRUE, scale=FALSE) # centre covariates
U <- ncol(X)
indiv <- as.integer(mixdata$indiv)
scale_icept <- 5
priormean_icept <- -5
} else {
load(file="glmpoisson.RData")
}
table(y)
## y
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 951 339 160 131 82 75 58 42 27 25 18 19 13 10 12
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## 7 3 8 8 8 11 7 7 5 13 8 11 8 6 6
## 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
## 4 5 3 6 2 4 5 6 6 7 4 2 3 2 7
## 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## 5 4 8 5 2 4 2 1 2 2 1 1 1 1 3
## 62 65 1242 1246 1248 1255 1269 1274 1279 1355
## 1 1 1 1 1 1 1 1 1 1
This simulates a sample of 200 individuals observed for an average of 11 time intervals with Poisson regression coefficients of -0.5 for the effects of covariate x1
and 0 for the effect of covariate x2
. With such large variation between individuals, and so many time intervals of observation on each individual, we can expect negative binomial regression, which treats all person-time intervals as exchangeable, to perform poorly.
We start by modelling this dataset using the standard approach: negative binomial regression
require(MASS)
nbmodel <-
glm.nb(formula = y ~ x1 + x2, data = mixdata, link = log)
summary(nbmodel)
##
## Call:
## glm.nb(formula = y ~ x1 + x2, data = mixdata, link = log, init.theta = 0.2293632207)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6646 -1.1904 -0.7081 -0.1298 5.0049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.96979 0.04596 42.862 < 2e-16 ***
## x1 -1.03335 0.04977 -20.762 < 2e-16 ***
## x2 -0.14993 0.04507 -3.327 0.000879 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.2294) family taken to be 1)
##
## Null deviance: 2522.6 on 2195 degrees of freedom
## Residual deviance: 2112.1 on 2193 degrees of freedom
## AIC: 10667
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.22936
## Std. Err.: 0.00803
##
## 2 x log-likelihood: -10658.74800
pander(summary(nbmodel)$coefficients, digits=c(3, 2, 3, 1))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 1.97 | 0.046 | 42.9 | 0 |
x1 | -1.03 | 0.05 | -20.8 | 1e-95 |
x2 | -0.15 | 0.045 | -3.33 | 9e-04 |
This gives seriously misleading results: the coefficients of covariates x1
and x2
, which should be -0.5 and 0, are estimated as -1.03 and -0.15.
lme4
packageInstead we try fitting the correct model – a generalized linear mixed model with Poisson likelihood – using the lme4
package. This maximizes the marginal likelihood of the fixed effects using an approximate algorithm: adaptive Gauss-Hermite quadrature.
require(lme4)
## adaptive Gauss-Hermite quadrature
ghmodel <- lme4::glmer(formula = y ~ x1 + x2 + (1 | indiv), data = mixdata, family = poisson,
control = glmerControl(),
start = NULL, verbose = 0L, nAGQ = 1L, # subset, weights, na.action,
offset = NULL, contrasts = NULL, # mustart, etastart,
devFunOnly = FALSE)
summary(ghmodel)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: y ~ x1 + x2 + (1 | indiv)
## Data: mixdata
## Control: glmerControl()
##
## AIC BIC logLik deviance df.resid
## 6896.1 6918.9 -3444.1 6888.1 2192
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6459 -0.5672 -0.2809 0.3923 5.8859
##
## Random effects:
## Groups Name Variance Std.Dev.
## indiv (Intercept) 4.359 2.088
## Number of obs: 2196, groups: indiv, 200
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.004768 0.153949 -0.031 0.9753
## x1 -0.446290 0.160754 -2.776 0.0055 **
## x2 0.007380 0.148867 0.050 0.9605
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) x1
## x1 -0.085
## x2 -0.040 0.141
On this simulated dataset, stan_glmer()
works well. On larger and more complex datasets, I have found that this approach is very slow and gives wrong answers
Where approximate methods do not work, to use a fully Bayesian approach, relying on Markov chain Monte Carlo simulation to sample the posterior.
A version with Stan code written directly gives us more flexibility than relying on the rstanarm
package. It’s also faster.
// mixed model with Poisson likelihood
data {
int < lower = 1 > N; // number of observations
int < lower = 1 > U; // number of unpenalized columns in X matrix
int < lower = 1 > N_indivs; // number of individuals
int < lower = 1 > indiv[N]; // subscripts indexing individuals
real priormean_icept; // prior mean of intercept
real < lower=0 > scale_icept ; // sd of prior on intercept
int < lower=0 > y[N]; // count of events
matrix[N, U] X; // X matrix without ones for intercept in column 1
}
transformed data {
matrix[N, U] Q_ast; // QR reparameterization
matrix[U, U] R_ast;
matrix[U, U] R_ast_inverse;
// thin and scale the QR decomposition
Q_ast = qr_Q(X)[, 1:U] * sqrt(N - 1);
R_ast = qr_R(X)[1:U, ] / sqrt(N - 1);
R_ast_inverse = inverse(R_ast);
}
parameters {
real alpha; // coefficient of variation of random effects
real < lower=0 > sigma_indiv; // SD of random effects
vector[U] theta; // coefficients on Q_ast
vector[N_indivs] xi; // random effects with standard normal prior
// vector[N_indivs] w; // individual random effects
}
transformed parameters {
vector[N] eta; // linear predictor on scale of log hazard rate
vector[N_indivs] w; // individual random effects
eta = Q_ast * theta;
w = alpha + xi * sigma_indiv; // construct w by scaling xi by sigma_indiv and shifting by alpha
}
model {
sigma_indiv ~ cauchy(0, 5); // half-cauchy prior
alpha ~ normal(0, 10);
theta ~ normal(0, scale_icept);
xi ~ normal(0, 1);
for(i in 1:N) {
y[i] ~ poisson_log(eta[i] + w[indiv[i]]);
}
}
generated quantities {
vector[U] beta;
beta = R_ast_inverse * theta; // coefficients on x
}
The Stan code is just a generalized linear model with poisson likelihood and logarithmic link function, with a random effect for each individuals. For efficient sampling there is a QR reparameterization on the regression coefficients, and a non-centred reparameterization of the random effects as described in the user manual.
## version with stan code written directly
require(rstan)
if(newrun) {
glmmpoisson.model <- stan_model(file="./glmm_poisson.stan")
pars.names = c("alpha", "beta", "sigma_indiv", "lp__")
## MCMC samples
glmm.samples.mc <- sampling(glmmpoisson.model, #data=xdata.list,
iter=1000, warmup=500,
chains=4, seed=123, open_progress=FALSE,
pars=pars.names)
}
pairs(glmm.samples.mc, pars=pars.names)
stan_trace(glmm.samples.mc, pars=pars.names)
stan_dens(glmm.samples.mc, pars=pars.names)
pander(round(summary(glmm.samples.mc,
probs=c(0.1, 0.9))$summary, 3))
mean | se_mean | sd | 10% | 90% | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
alpha | -0.072 | 0.022 | 0.162 | -0.287 | 0.14 | 55.43 | 1.081 |
beta[1] | -0.435 | 0.021 | 0.159 | -0.625 | -0.236 | 54.44 | 1.081 |
beta[2] | 0.003 | 0.022 | 0.161 | -0.211 | 0.215 | 54.7 | 1.07 |
sigma_indiv | 2.135 | 0.01 | 0.119 | 1.987 | 2.283 | 151.3 | 1.022 |
**lp__** | 82044 | 0.954 | 14.07 | 82026 | 82062 | 217.3 | 1.015 |
The most useful diagnostic is the pairs plot. This shows that the parameter sigma_indiv
is correlated with the log posterior probability (equivalent to the energy in a Hamiltonian system). This might be fixed by some reparameterization.
The trace plots show that the sampling chains are mixing rather slowly. The effective sample size is small because of the autocorrelation of the sampled values, and the R-hat values are greater than 1.1, indicating that the ratio of variance between chains to variance within chains is high.
In principle these limitations could be overcome either by a clever reparameterization of the model, or by running the sampler for longer. As Stan performs well on the real datasets we are modelling, we won’t bother trying to improve the parameterization.
We can extend this Stan model to include a changepoint in x at which the slope changes from zero to beta. This will be demonstrated with a simulated dataset with a changepoint at x = 0
.
if(newrun) {
## simulate from model with changepoint
gamma <- 0.5
x <- rnorm(N.indivs, 0, 1) # fixed covariate
lambda <- exp(logbaseline + beta * x + gamma * ifelse(x < 0, 0, x))
mixdata.chpt <- NULL
for(i in 1:N.indivs) {
mixdata.chpt <- rbind(mixdata.chpt,
data.frame(indiv=as.factor(rep(i, T[i])),
time=1:T[i],
x=rep(x[i], T[i]),
x1=rep(x1[i], T[i]),
x2=rep(x2[i], T[i]),
y=rpois(T[i], lambda[i])
))
}
}
# head(mixdata.chpt)
y <- mixdata.chpt$y
x <- mixdata.chpt$x
X <- model.matrix(object = y ~ x1 + x2, data = mixdata.chpt)[, -1]
X <- scale(X, center=TRUE, scale=FALSE)
U <- ncol(X)
indiv <- as.integer(mixdata$indiv)
scale_icept <- 5