Generalized Linear Mixed-Effects Models

What Are Generalized Linear Mixed-Effects Models?

Generalized linear mixed-effects (GLME) models describe the relationship between a response variable and independent variables using coefficients that can vary with respect to one or more grouping variables, for data with a response variable distribution other than normal. You can think of GLME models as extensions of generalized linear models (GLM) for data that are collected and summarized in groups. Alternatively, you can think of GLME models as a generalization of linear mixed-effects models (LME) for data where the response variable is not normally distributed.

A mixed-effects model consists of fixed-effects and random-effects terms. Fixed-effects terms are usually the conventional linear regression part of the model. Random-effects terms are associated with individual experimental units drawn at random from a population, and account for variations between groups that might affect the response. The random effects have prior distributions, whereas the fixed effects do not.

GLME Model Equations

The standard form of a generalized linear mixed-effects model is

$\begin{matrix} y_{i} | b \end{matrix} \sim D i s t r (μ_{i}, \frac{σ^{2}}{w_{i}})$

$g (μ) = X β + Z b + δ,$

where

y is an n-by-1 response vector, and y_i is its ith element.
b is the random-effects vector.
Distr is a specified conditional distribution of y given b.
μ is the conditional mean of y given b, and μ_i is its ith element.
σ² is the dispersion parameter.
w is the effective observation weight vector, and w_i is the weight for observation i.
- For a binomial distribution, the effective observation weight is equal to the prior weight specified using the 'Weights' name-value pair argument in fitglme, multiplied by the binomial size specified using the 'BinomialSize' name-value pair argument.
- For all other distributions, the effective observation weight is equal to the prior weight specified using the 'Weights' name-value pair argument in fitglme.
g(μ) is a link function that defines the relationship between the mean response μ and the linear combination of the predictors.
X is an n-by-p fixed-effects design matrix.
β is a p-by-1 fixed-effects vector.
Z is an n-by-q random-effects design matrix.
b is a q-by-1 random-effects vector.
δ is a model offset vector.

The model for the mean response μ is

$μ = g^{- 1} (η),$

where g^-1 is inverse of the link function g(μ), and ${\hat{η}}_{M E}$ is the linear predictor of the fixed and random effects of the generalized linear mixed-effects model

$η = X β + Z b + δ .$

A GLME model is parameterized by β, θ, and σ².

The assumptions for generalized linear mixed-effects models are:

The random effects vector b has the prior distribution:
$b | σ^{2}, θ \sim N (0, σ^{2} D (θ)),$
where σ² is the dispersion parameter, and D is a symmetric and positive semidefinite matrix parameterized by an unconstrained parameter vector θ.
The observations y_i are conditionally independent given b.

Prepare Data for Model Fitting

To fit a GLME model to your data, use fitglme. Format your input data using the table data type. Each row of the table represents one observation, and each column represents one predictor variable. For more information on creating and using table, see Create Tables and Assign Data to Them.

Input data can include continuous and grouping variables. fitglme assumes that predictors using the following data types are categorical:

Logical
Categorical
Character vector or character array
String array
Cell array of character vectors

If the input data table contains any NaN values, then fitglme excludes that entire row of data from the fit. To exclude additional rows of data, you can use the 'Exclude' name-value pair argument of fitglme when fitting the model.

Choose a Distribution Type for the Model

GLME models are used when the response data does not follow a normal distribution. Therefore, when fitting a model using fitglme, you must specify the response distribution type using the 'Distribution' name-value pair argument. Often, the type of response data suggests the appropriate distribution type for the model.

Type of Response Data	Suggested Response Distribution Type
Any real number	`'Normal'`
Any positive number	`'Gamma'` or `'InverseGaussian'`
Any nonnegative integer	`'Poisson'`
Integer from 0 to n, where n is a fixed positive value	`'Binomial'`

Choose a Link Function for the Model

GLME models use a link function, g, to map the relationship between the mean response and the linear combination of the predictors. By default, fitglme uses a predefined, commonly accepted link function based on the specified distribution of the response data, as shown in the following table. However, you can specify a different link function from the list of predefined functions, or define your own, using the 'Link' name-value pair argument of fitglme.

Value	Description
`'comploglog'`	`g(mu) = log(-log(1-mu))`
`'identity'`	`g(mu) = mu` Canonical link for the normal distribution.
`'log'`	`g(mu) = log(mu)` Canonical link for the Poisson distribution.
`'logit'`	`g(mu) = log(mu/(1-mu))` Canonical link for the binomial distribution.
`'loglog'`	`g(mu) = log(-log(mu))`
`'probit'`	`g(mu) = norminv(mu)`
`'reciprocal'`	`g(mu) = mu.^(-1)`
Scalar value `P`	`g(mu) = mu.^P`
Structure `S`	A structure containing four fields whose values are function handles: `S.Link` — Link function `S.Derivative` — Derivative `S.SecondDerivative` — Second derivative `S.Inverse` — Inverse of link If `'FitMethod'` is `'MPL'` or `'REMPL'`, or if `S` represents a canonical link for the specified distribution, you can omit the specification of `S.SecondDerivative`.

When fitting a model to data, fitglme uses the canonical link function by default.

Distribution	Default Link Function
`'Normal'`	`'identity'`
`'Binomial'`	`'logit'`
`'Poisson'`	`'log'`
`'Gamma'`	`-1`
`'InverseGaussian'`	`-2`

The link functions 'comploglog', 'loglog', and 'probit' are mainly useful for binomial models.

Specify the Model Formula

Model specification for fitglme uses Wilkinson notation, which is a character vector or string scalar of the form 'y ~ terms', where y is the response variable name, and terms is written in the following notation.

Wilkinson Notation	Factors in Standard Notation
`1`	Constant (intercept) term
`X^k`, where `k` is a positive integer	`X`, `X²`, ..., `X^k`
`X1 + X2`	`X1`, `X2`
`X1*X2`	`X1`, `X2`, `X1.*X2 (element-wise multiplication of X1 and X2)`
`X1:X2`	`X1.*X2` only
`- X2`	Do not include `X2`
`X1*X2 + X3`	`X1`, `X2`, `X3`, `X1*X2`
`X1 + X2 + X3 + X1:X2`	`X1`, `X2`, `X3`, `X1*X2`
`X1X2X3 - X1:X2:X3`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`, `X2*X3`
`X1*(X2 + X3)`	`X1`, `X2`, `X3`, `X1X2`, `X1X3`

Formulas include a constant (intercept) term by default. To exclude a constant term from the model, include –1 in the formula.

For generalized linear mixed-effects models, the formula specification is of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms, respectively.

Suppose the input data table contains the following:

A response variable, y
Predictor variables, X1, X2, ..., XJ, where J is the total number of predictor variables (including continuous and grouping variables).
Grouping variables, g1, g2, ..., gR, where R is the number of grouping variables.

The grouping variables in XJ and gR can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form 'y ~ fixed + (random1|g1) + ... + (randomR|gR)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrix ZR corresponding to grouping variable gR. You can express the fixed and random terms using Wilkinson notation as follows.

Formula	Description
`'y ~ X1 + X2'`	Fixed effects for the intercept, `X1`, and `X2`. This formula is equivalent to `'y ~ 1 + X1 + X2'`.
`'y ~ -1 + X1 + X2'`	No intercept, with fixed effects for `X1` and `X2`. The implicit intercept term is suppressed by including `-1`.
`'y ~ 1 + (1 \| g1)'`	A fixed effect for the intercept, plus a random effect for the intercept for each level of the grouping variable `g1`.
`'y ~ X1 + (1 \| g1)'`	Random intercept model with a fixed slope.
`'y ~ X1 + (X1 \| g1)'`	Random intercept and slope, with possible correlation between them. This formula is equivalent to `'y ~ 1 + X1 + (1 + X1\|g1)'`.
`'y ~ X1 + (1 \| g1) + (-1 + X1 \| g1)'`	Independent random-effects terms for intercept and slope.
`'y ~ 1 + (1 \| g1) + (1 \| g2) + (1 \| g1:g2)'`	Random intercept model with independent main effects for `g1` and `g2`, plus an independent interaction effect.

For example, the sample data mfr contains simulated data from a manufacturing company that operates 50 factories across the world. Each factory runs a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches), and recorded data on processing time (time_dev), temperature (temp_dev), number of defects (defects), and a categorical variable indicating the raw materials supplier (supplier) for each batch.

To determine whether the new process (represented by the predictor variable newprocess) significantly reduces the number of defects, fit a GLME model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution.

The number of defects can be modeled using a Poisson distribution

$d e f e c t s_{i j} ~ P o i s s o n (μ_{i j})$

This corresponds to the generalized linear mixed-effects model

$\begin{array}{l} \log (μ_{i j}) = β_{0} + β_{1} n e w p r o c e s s_{i j} + β_{2} t i m e_d e v_{i j} \\ + β_{3} t e m p_d e v_{i j} + β_{4} s u p p l i e r_C_{i j} + β_{5} s u p p l i e r_B_{i j} + b_{i}, \end{array}$

where

defects_ij is the number of defects observed in the batch produced by factory i (where i = 1, 2, ..., 20) during batch j (where j = 1, 2, ..., 5).
μ_ij is the mean number of defects corresponding to factory i during batch j.
supplier_C_ij and supplier_B_ij are dummy variables that indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j.
b_i ~ N(0,σ_b²) is a random-effects intercept for each factory i that accounts for factory-specific variation in quality.

Using Wilkinson notation, specify this model as:

'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)'

To account for the Poisson distribution of the response variable, when fitting the model using fitglme, specify the 'Distribution' name-value pair argument as 'Poisson'. By default, fitglme uses a log link function for response variables with a Poisson distribution.

Display the Model

The output of the fitting function fitglme provides information about generalized linear mixed-effects model.

Using the mfr manufacturing experiment data, fit a model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Specify the response distribution as Poisson, the link function as log, and the fit method as Laplace.

load mfr

glme = fitglme(mfr,...
		'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',...
		'Distribution','Poisson','Link','log','FitMethod','Laplace',...
		'DummyVarCoding','effects')

glme = 


Generalized linear mixed-effects model fit by ML

Model information:
    Number of observations             100
    Fixed effects coefficients           6
    Random effects coefficients         20
    Covariance parameters                1
    Distribution                    Poisson
    Link                            Log   
    FitMethod                       Laplace

Formula:
    defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    416.35    434.58    -201.17          402.35  

Fixed effects coefficients (95% CIs):
    Name                 Estimate     SE          tStat       DF    pValue    
    '(Intercept)'           1.4689     0.15988      9.1875    94    9.8194e-15
    'newprocess'          -0.36766     0.17755     -2.0708    94      0.041122
    'time_dev'           -0.094521     0.82849    -0.11409    94       0.90941
    'temp_dev'            -0.28317      0.9617    -0.29444    94       0.76907
    'supplier_C'         -0.071868    0.078024     -0.9211    94       0.35936
    'supplier_B'          0.071072     0.07739     0.91836    94       0.36078


    Lower        Upper    
       1.1515       1.7864
     -0.72019    -0.015134
      -1.7395       1.5505
      -2.1926       1.6263
     -0.22679     0.083051
    -0.082588      0.22473

Random effects covariance parameters:
Group: factory (20 Levels)
    Name1                Name2                Type         Estimate
    '(Intercept)'        '(Intercept)'        'std'        0.31381 

Group: Error
    Name                      Estimate
    'sqrt(Dispersion)'        1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace.

Formula indicates the model specification using Wilkinson’s notation.

The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values.

The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by fitglme does not provide confidence intervals for the random-effects parameters. To compute and display these values, use covarianceParameters.

Work with the Model

After you create a GLME model using fitglme, you can use additional functions to work with the model.

Inspect and Test Coefficients and Confidence Intervals

To extract estimates of the fixed- and random-effects coefficients, covariance parameters, design matrices, and related statistics:

fixedEffects extracts estimated fixed-effects coefficients and related statistics from a fitted model. Related statistics include the standard error; the t-statistic, degrees of freedom, and p-value for a hypothesis test of whether each parameter is equal to 0; and the confidence intervals.
randomEffects extracts estimated random-effects coefficients and related statistics from a fitted GLME model. Related statistics include the estimated empirical Bayes predictor (EBP) of each random effect, the square root of the conditional mean squared error of prediction (CMSEP) given the covariance parameters and the response; the t-statistic, estimated degrees of freedom, and p-value for a hypothesis test of whether each random effect is equal to 0; and the confidence intervals.
covarianceParameters extracts estimated covariance parameters and related statistics from a fitted GLME model. Related statistics include estimate of the covariance parameter, and the confidence intervals.
designMatrix extracts the fixed- and random-effects design matrices, or a specified subset thereof, from the fitted GLME model.

To conduct customized hypothesis tests for the significance of fixed- and random-effects coefficients, and to compute custom confidence intervals:

anova performs a marginal F-test (hypothesis test) on fixed-effects terms, to determine if all coefficients representing the fixed-effects terms are equal to 0. You can use anova to test the combined significance of the coefficients of categorical predictors.
coefCI computes confidence intervals for fixed- and random-effects parameters from a fitted GLME model. By default, fitglme computes 95% confidence intervals. Use coefCI to compute the boundaries at a different confidence level.
coefTest performs custom hypothesis tests on fixed-effects or random-effects vectors of a fitted generalized linear mixed-effects model. For example, you can specify contrast matrices.

Generate New Response Values and Refit Model

To generate new response values, including fitted, predicted, and random responses, based on the fitted GLME model:

fitted computes fitted response values using the original predictor values, and the estimated coefficient and parameter values from the fitted model.
predict computes the predicted conditional or marginal mean of the response using either the original predictor values or new predictor values, and the estimated coefficient and parameter values from the fitted model.
random generates random responses from a fitted model.
refit creates a new fitted GLME model, based on the original model and a new response vector.

Inspect and Visualize Residuals

To extract and visualize residuals from the fitted GLME model:

residuals extracts the raw or Pearson residuals from the fitted model. You can also specify whether to compute the conditional or marginal residuals.
plotResiduals creates plots using the raw or Pearson residuals from the fitted model, including:
- A histogram of the residuals
- A scatterplot of the residuals versus fitted values
- A scatterplot of residuals versus lagged residuals