Introduction

To model hypoglycaemic episodes in people with Type 1 diabetes in clinical trials and observational studies, the standard approach is to use negative binomial regression. The problem with this is that negative binomial regression assumes that person-time intervals as exchangeable: the individual identifiers attached to person-time intervals are ignored.

To exploit the information provided by repeat observations on the same individuals, we need a a mixed model. Because the data are counts with many zeroes, we need a Poisson likelihood.

Linear mixed models, with a gaussian likelihood, are easy to fit because the random effects can be integrated out. For binary data (Bernoulli likelihood) or count data (Poisson likelihood), there is no exact solution to the integral.

One approach to this is to approximate the integral and maximize the marginal likelihood of the model parameters: a Laplace approximation is implemented in the glmer() function in the lmer4 package. Alternatively, a fully Bayesian approach can be used to sample the posterior density of the parameters. This does not rely on asymptotic approximations, and can be extended to more complex problems.

This tutorial demonstrates both approaches using a simulated dataset with counts generated from a Poisson distribution in a mixed model with a single fixed-effect covariate.

Simulated dataset

The first step is to simulate some data for Poisson counts with individual baseline hazard rates drawn from a log-normal distribution

library(ddpcr)
library(pander)
library(stanhl)  # # edit defaults.R before building, with print(list(get_all_styles()))
stanhl_opts$set(style="tango")
quiet(stanhl_html()) # set up stan highlighting for HTML
stanhl_opts$set(style="tango")
## rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

N.indivs <- 200 # num individuals
beta <- -0.5 # log rate ratio parameter
mean.numintervals <- 10
mean.logbaseline <- 0
sd.logbaseline <- 2

newrun <- TRUE
#newrun <- FALSE
withvb <- FALSE

if(newrun) { 
    ## generate random number of person-time intervals for each individual
    T <- 1 + rpois(N.indivs, mean.numintervals)
    ## simulate baseline hazard rates from a log-normal distribution
    logbaseline <- rnorm(N.indivs, mean.logbaseline, sd.logbaseline)
    ## simulate fixed covariate
    x1 <- rnorm(N.indivs, 0, 1) # fixed covariate
    x2 <- rnorm(N.indivs, 0, 1) # fixed covariate
    
    ## simulate from model with no changepoint
    lambda <- exp(logbaseline + beta * x1)
    mixdata <- NULL
    for(i in 1:N.indivs) {
        mixdata <- rbind(mixdata,
                         data.frame(indiv=as.factor(rep(i, T[i])),
                                    time=1:T[i],
                                    x1=rep(x1[i], T[i]),
                                    x2=rep(x2[i], T[i]),
                                    y=rpois(T[i], lambda[i])
                                    )) 
    }
    head(mixdata)
    y <- mixdata$y
    N <- length(y)
    N_indivs <- length(unique(mixdata$indiv))
    X <- model.matrix(object =  y ~ x1 + x2, data = mixdata)[, -1]
    X <- scale(X, center=TRUE, scale=FALSE) # centre covariates
    U <- ncol(X)
    indiv <- as.integer(mixdata$indiv)
    scale_icept <- 5
    priormean_icept <- -5

} else {
    load(file="glmpoisson.RData")
}

table(y)
## y
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##  951  339  160  131   82   75   58   42   27   25   18   19   13   10   12 
##   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29 
##    7    3    8    8    8   11    7    7    5   13    8   11    8    6    6 
##   30   31   32   33   34   35   36   37   38   39   40   41   42   43   44 
##    4    5    3    6    2    4    5    6    6    7    4    2    3    2    7 
##   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59 
##    5    4    8    5    2    4    2    1    2    2    1    1    1    1    3 
##   62   65 1242 1246 1248 1255 1269 1274 1279 1355 
##    1    1    1    1    1    1    1    1    1    1

This simulates a sample of 200 individuals observed for an average of 11 time intervals with Poisson regression coefficients of -0.5 for the effects of covariate x1 and 0 for the effect of covariate x2. With such large variation between individuals, and so many time intervals of observation on each individual, we can expect negative binomial regression, which treats all person-time intervals as exchangeable, to perform poorly.

Negative binomial regression

We start by modelling this dataset using the standard approach: negative binomial regression

require(MASS)
nbmodel <-
    glm.nb(formula =  y ~ x1 +  x2, data = mixdata, link = log)

summary(nbmodel)
## 
## Call:
## glm.nb(formula = y ~ x1 + x2, data = mixdata, link = log, init.theta = 0.2293632207)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6646  -1.1904  -0.7081  -0.1298   5.0049  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.96979    0.04596  42.862  < 2e-16 ***
## x1          -1.03335    0.04977 -20.762  < 2e-16 ***
## x2          -0.14993    0.04507  -3.327 0.000879 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.2294) family taken to be 1)
## 
##     Null deviance: 2522.6  on 2195  degrees of freedom
## Residual deviance: 2112.1  on 2193  degrees of freedom
## AIC: 10667
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.22936 
##           Std. Err.:  0.00803 
## 
##  2 x log-likelihood:  -10658.74800
pander(summary(nbmodel)$coefficients, digits=c(3, 2, 3, 1))
  Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.97 0.046 42.9 0
x1 -1.03 0.05 -20.8 1e-95
x2 -0.15 0.045 -3.33 9e-04

This gives seriously misleading results: the coefficients of covariates x1 and x2, which should be -0.5 and 0, are estimated as -1.03 and -0.15.

Fitting generalized linear mixed model with Poisson likelihood using lme4 package

Instead we try fitting the correct model – a generalized linear mixed model with Poisson likelihood – using the lme4 package. This maximizes the marginal likelihood of the fixed effects using an approximate algorithm: adaptive Gauss-Hermite quadrature.

require(lme4)
## adaptive Gauss-Hermite quadrature
ghmodel <- lme4::glmer(formula =  y ~ x1 + x2 + (1 | indiv), data = mixdata, family = poisson,
      control = glmerControl(),
      start = NULL, verbose = 0L, nAGQ = 1L, # subset, weights, na.action,
      offset = NULL, contrasts = NULL, # mustart, etastart,
      devFunOnly = FALSE)
summary(ghmodel)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: y ~ x1 + x2 + (1 | indiv)
##    Data: mixdata
## Control: glmerControl()
## 
##      AIC      BIC   logLik deviance df.resid 
##   6896.1   6918.9  -3444.1   6888.1     2192 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6459 -0.5672 -0.2809  0.3923  5.8859 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  indiv  (Intercept) 4.359    2.088   
## Number of obs: 2196, groups:  indiv, 200
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -0.004768   0.153949  -0.031   0.9753   
## x1          -0.446290   0.160754  -2.776   0.0055 **
## x2           0.007380   0.148867   0.050   0.9605   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##    (Intr) x1    
## x1 -0.085       
## x2 -0.040  0.141

On this simulated dataset, stan_glmer() works well. On larger and more complex datasets, I have found that this approach is very slow and gives wrong answers

Bayesian inference with Stan

Where approximate methods do not work, to use a fully Bayesian approach, relying on Markov chain Monte Carlo simulation to sample the posterior.

A version with Stan code written directly gives us more flexibility than relying on the rstanarm package. It’s also faster.

// mixed model with Poisson likelihood
data {
  int < lower = 1 > N; // number of observations
  int < lower = 1 > U; // number of unpenalized columns in X matrix
  int < lower = 1 > N_indivs; // number of individuals
  int < lower = 1 > indiv[N]; // subscripts indexing individuals
  real priormean_icept; // prior mean of intercept
  real < lower=0 > scale_icept ; // sd of prior on intercept
  int < lower=0 > y[N]; // count of events
  matrix[N, U] X; // X matrix without ones for intercept in column 1 
}

transformed data { 
  matrix[N, U] Q_ast; // QR reparameterization
  matrix[U, U] R_ast;
  matrix[U, U] R_ast_inverse;
  // thin and scale the QR decomposition
  Q_ast = qr_Q(X)[, 1:U] * sqrt(N - 1);
  R_ast = qr_R(X)[1:U, ] / sqrt(N - 1);
  R_ast_inverse = inverse(R_ast);
}

parameters {
  real alpha; // coefficient of variation of random effects
  real < lower=0 > sigma_indiv; // SD of random effects
  vector[U] theta;      // coefficients on Q_ast
  vector[N_indivs] xi; // random effects with standard normal prior
//  vector[N_indivs] w; // individual random effects
}

transformed parameters {
  vector[N] eta; // linear predictor on scale of log hazard rate
  vector[N_indivs] w; // individual random effects 
  eta = Q_ast * theta; 
  w = alpha + xi * sigma_indiv; // construct w by scaling xi by sigma_indiv and shifting by alpha
}

model {
  sigma_indiv ~ cauchy(0, 5); // half-cauchy prior
  alpha ~ normal(0, 10);
  theta ~ normal(0, scale_icept);
  xi ~ normal(0, 1);
  for(i in 1:N) {
    y[i] ~ poisson_log(eta[i] + w[indiv[i]]);
  }
}

generated quantities {
  vector[U] beta;
  beta = R_ast_inverse * theta; // coefficients on x
}

The Stan code is just a generalized linear model with poisson likelihood and logarithmic link function, with a random effect for each individuals. For efficient sampling there is a QR reparameterization on the regression coefficients, and a non-centred reparameterization of the random effects as described in the user manual.

## version with stan code written directly
require(rstan)
if(newrun) {
glmmpoisson.model <- stan_model(file="./glmm_poisson.stan")

pars.names = c("alpha", "beta", "sigma_indiv", "lp__")

## MCMC samples
glmm.samples.mc <- sampling(glmmpoisson.model, #data=xdata.list,
                          iter=1000, warmup=500,
                          chains=4, seed=123, open_progress=FALSE,
                          pars=pars.names)
}

pairs(glmm.samples.mc, pars=pars.names)

stan_trace(glmm.samples.mc, pars=pars.names)

stan_dens(glmm.samples.mc, pars=pars.names)

pander(round(summary(glmm.samples.mc,
                    probs=c(0.1, 0.9))$summary, 3))
  mean se_mean sd 10% 90% n_eff Rhat
alpha -0.072 0.022 0.162 -0.287 0.14 55.43 1.081
beta[1] -0.435 0.021 0.159 -0.625 -0.236 54.44 1.081
beta[2] 0.003 0.022 0.161 -0.211 0.215 54.7 1.07
sigma_indiv 2.135 0.01 0.119 1.987 2.283 151.3 1.022
**lp__** 82044 0.954 14.07 82026 82062 217.3 1.015

The most useful diagnostic is the pairs plot. This shows that the parameter sigma_indiv is correlated with the log posterior probability (equivalent to the energy in a Hamiltonian system). This might be fixed by some reparameterization.

The trace plots show that the sampling chains are mixing rather slowly. The effective sample size is small because of the autocorrelation of the sampled values, and the R-hat values are greater than 1.1, indicating that the ratio of variance between chains to variance within chains is high.

In principle these limitations could be overcome either by a clever reparameterization of the model, or by running the sampler for longer. As Stan performs well on the real datasets we are modelling, we won’t bother trying to improve the parameterization.

Changepoint model

We can extend this Stan model to include a changepoint in x at which the slope changes from zero to beta. This will be demonstrated with a simulated dataset with a changepoint at x = 0.

if(newrun) {
    ## simulate from model with changepoint
    gamma <- 0.5
    x <- rnorm(N.indivs, 0, 1) # fixed covariate
    lambda <- exp(logbaseline + beta * x + gamma * ifelse(x < 0, 0, x))
    mixdata.chpt <- NULL
    for(i in 1:N.indivs) {
        mixdata.chpt <- rbind(mixdata.chpt,
                              data.frame(indiv=as.factor(rep(i, T[i])),
                                         time=1:T[i],
                                         x=rep(x[i], T[i]),
                                         x1=rep(x1[i], T[i]),
                                         x2=rep(x2[i], T[i]),
                                         y=rpois(T[i], lambda[i])
                                         )) 
    }
}
# head(mixdata.chpt)
y <- mixdata.chpt$y
x <- mixdata.chpt$x
X <- model.matrix(object =  y ~ x1 + x2, data = mixdata.chpt)[, -1]
X <- scale(X, center=TRUE, scale=FALSE)
U <- ncol(X)
indiv <- as.integer(mixdata$indiv)
scale_icept <- 5