## Generalized Linear Mixed-Effects Models

### What Are Generalized Linear Mixed-Effects Models?

Generalized linear mixed-effects (GLME) models describe the relationship between a response variable and independent variables using coefficients that can vary with respect to one or more grouping variables, for data with a response variable distribution other than normal. You can think of GLME models as extensions of generalized linear models (GLM) for data that are collected and summarized in groups. Alternatively, you can think of GLME models as a generalization of linear mixed-effects models (LME) for data where the response variable is not normally distributed.

A mixed-effects model consists of fixed-effects and random-effects terms. Fixed-effects terms are usually the conventional linear regression part of the model. Random-effects terms are associated with individual experimental units drawn at random from a population, and account for variations between groups that might affect the response. The random effects have prior distributions, whereas the fixed effects do not.

### GLME Model Equations

The standard form of a generalized linear mixed-effects model is

`$\begin{array}{c}{y}_{i}|b\end{array}\sim Distr\left({\mu }_{i},\frac{{\sigma }^{2}}{{w}_{i}}\right)$`
`$g\left(\mu \right)=X\beta +Zb+\delta \text{\hspace{0.17em}},$`

where

• y is an n-by-1 response vector, and yi is its ith element.

• b is the random-effects vector.

• Distr is a specified conditional distribution of y given b.

• μ is the conditional mean of y given b, and μi is its ith element.

• σ2 is the dispersion parameter.

• w is the effective observation weight vector, and wi is the weight for observation i.

• For a binomial distribution, the effective observation weight is equal to the prior weight specified using the `'Weights'` name-value pair argument in `fitglme`, multiplied by the binomial size specified using the `'BinomialSize'` name-value pair argument.

• For all other distributions, the effective observation weight is equal to the prior weight specified using the `'Weights'` name-value pair argument in `fitglme`.

• g(μ) is a link function that defines the relationship between the mean response μ and the linear combination of the predictors.

• X is an n-by-p fixed-effects design matrix.

• β is a p-by-1 fixed-effects vector.

• Z is an n-by-q random-effects design matrix.

• b is a q-by-1 random-effects vector.

• δ is a model offset vector.

The model for the mean response μ is

`$\mu ={g}^{-1}\left(\eta \right)\text{\hspace{0.17em}},$`

where g-1 is inverse of the link function g(μ), and ${\stackrel{^}{\eta }}_{ME}$ is the linear predictor of the fixed and random effects of the generalized linear mixed-effects model

`$\eta =X\beta +Zb+\delta \text{\hspace{0.17em}}.$`

A GLME model is parameterized by β, θ, and σ2.

The assumptions for generalized linear mixed-effects models are:

• The random effects vector b has the prior distribution:

`$b|{\sigma }^{2},\theta \sim N\left(0,{\sigma }^{2}D\left(\theta \right)\right)\text{\hspace{0.17em}},$`

where σ2 is the dispersion parameter, and D is a symmetric and positive semidefinite matrix parameterized by an unconstrained parameter vector θ.

• The observations yi are conditionally independent given b.

### Prepare Data for Model Fitting

To fit a GLME model to your data, use `fitglme`. Format your input data using the `table` data type. Each row of the table represents one observation, and each column represents one predictor variable. For more information on creating and using `table`, see Create Tables and Assign Data to Them.

Input data can include continuous and grouping variables. `fitglme` assumes that predictors using the following data types are categorical:

• Logical

• Categorical

• Character vector or character array

• String array

• Cell array of character vectors

If the input data table contains any `NaN` values, then `fitglme` excludes that entire row of data from the fit. To exclude additional rows of data, you can use the `'Exclude'` name-value pair argument of `fitglme` when fitting the model.

### Choose a Distribution Type for the Model

GLME models are used when the response data does not follow a normal distribution. Therefore, when fitting a model using `fitglme`, you must specify the response distribution type using the `'Distribution'` name-value pair argument. Often, the type of response data suggests the appropriate distribution type for the model.

Type of Response DataSuggested Response Distribution Type
Any real number`'Normal'`
Any positive number`'Gamma'` or `'InverseGaussian'`
Any nonnegative integer`'Poisson'`
Integer from 0 to n, where n is a fixed positive value`'Binomial'`

### Choose a Link Function for the Model

GLME models use a link function, g, to map the relationship between the mean response and the linear combination of the predictors. By default, `fitglme` uses a predefined, commonly accepted link function based on the specified distribution of the response data, as shown in the following table. However, you can specify a different link function from the list of predefined functions, or define your own, using the `'Link'` name-value pair argument of `fitglme`.

ValueDescription
`'comploglog'``g(mu) = log(-log(1-mu))`
`'identity'`

`g(mu) = mu`

Canonical link for the normal distribution.

`'log'`

`g(mu) = log(mu)`

Canonical link for the Poisson distribution.

`'logit'`

`g(mu) = log(mu/(1-mu))`

Canonical link for the binomial distribution.

`'loglog'``g(mu) = log(-log(mu))`
`'probit'``g(mu) = norminv(mu)`
`'reciprocal'``g(mu) = mu.^(-1)`
Scalar value `P``g(mu) = mu.^P`
Structure `S`

A structure containing four fields whose values are function handles:

• `S.Link` — Link function

• `S.Derivative` — Derivative

• `S.SecondDerivative` — Second derivative

• `S.Inverse` — Inverse of link

If `'FitMethod'` is `'MPL'` or `'REMPL'`, or if `S` represents a canonical link for the specified distribution, you can omit the specification of `S.SecondDerivative`.

When fitting a model to data, `fitglme` uses the canonical link function by default.

`'Normal'``'identity'`
`'Binomial'``'logit'`
`'Poisson'``'log'`
`'Gamma'``-1`
`'InverseGaussian'``-2`

The link functions `'comploglog'`, `'loglog'`, and `'probit'` are mainly useful for binomial models.

### Specify the Model Formula

Model specification for `fitglme` uses Wilkinson notation, which is a character vector or string scalar of the form `'y ~ terms'`, where `y` is the response variable name, and `terms` is written in the following notation.

Wilkinson NotationFactors in Standard Notation
`1`Constant (intercept) term
`X^k`, where `k` is a positive integer`X`, `X2`, ..., `Xk`
`X1 + X2``X1`, `X2`
`X1*X2``X1`, `X2`, ```X1.*X2 (element-wise multiplication of X1 and X2)```
`X1:X2``X1.*X2` only
`- X2`Do not include `X2`
`X1*X2 + X3``X1`, `X2`, `X3`, `X1*X2`
`X1 + X2 + X3 + X1:X2``X1`, `X2`, `X3`, `X1*X2`
`X1*X2*X3 - X1:X2:X3``X1`, `X2`, `X3`, `X1*X2`, `X1*X3`, `X2*X3`
`X1*(X2 + X3)``X1`, `X2`, `X3`, `X1*X2`, `X1*X3`

Formulas include a constant (intercept) term by default. To exclude a constant term from the model, include `–1` in the formula.

For generalized linear mixed-effects models, the formula specification is of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`, where `fixed` and `random` contain the fixed-effects and the random-effects terms, respectively.

Suppose the input data table contains the following:

• A response variable, `y`

• Predictor variables, `X1`, `X2`, ..., `XJ`, where J is the total number of predictor variables (including continuous and grouping variables).

• Grouping variables, `g1`, `g2`, ..., `gR`, where R is the number of grouping variables.

The grouping variables in `XJ` and `gR` can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors.

Then, in a formula of the form ```'y ~ fixed + (random1|g1) + ... + (randomR|gR)'```, the term `fixed` corresponds to a specification of the fixed-effects design matrix `X`, `random1` is a specification of the random-effects design matrix `Z1` corresponding to grouping variable `g1`, and similarly `randomR` is a specification of the random-effects design matrix `ZR` corresponding to grouping variable `gR`. You can express the `fixed` and `random` terms using Wilkinson notation as follows.

`'y ~ X1 + X2'`Fixed effects for the intercept, `X1`, and `X2`. This formula is equivalent to `'y ~ 1 + X1 + X2'`.
`'y ~ -1 + X1 + X2'`No intercept, with fixed effects for `X1` and `X2`. The implicit intercept term is suppressed by including `-1`.
`'y ~ 1 + (1 | g1)'`A fixed effect for the intercept, plus a random effect for the intercept for each level of the grouping variable `g1`.
`'y ~ X1 + (1 | g1)'`Random intercept model with a fixed slope.
`'y ~ X1 + (X1 | g1)'`Random intercept and slope, with possible correlation between them. This formula is equivalent to `'y ~ 1 + X1 + (1 + X1|g1)'`.
`'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' `Independent random-effects terms for intercept and slope.
`'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)'`Random intercept model with independent main effects for `g1` and `g2`, plus an independent interaction effect.

For example, the sample data `mfr` contains simulated data from a manufacturing company that operates 50 factories across the world. Each factory runs a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches), and recorded data on processing time (`time_dev`), temperature (`temp_dev`), number of defects (`defects`), and a categorical variable indicating the raw materials supplier (`supplier`) for each batch.

To determine whether the new process (represented by the predictor variable `newprocess`) significantly reduces the number of defects, fit a GLME model using `newprocess`, `time_dev`, `temp_dev`, and `supplier` as fixed-effects predictors. Include a random-effects intercept grouped by `factory`, to account for quality differences that might exist due to factory-specific variations. The response variable `defects` has a Poisson distribution.

The number of defects can be modeled using a Poisson distribution

`$defect{s}_{ij}~Poisson\left({\mu }_{ij}\right)$`

This corresponds to the generalized linear mixed-effects model

`$\begin{array}{l}\mathrm{log}\left({\mu }_{ij}\right)={\beta }_{0}+{\beta }_{1}newproces{s}_{ij}+{\beta }_{2}time_de{v}_{ij}\text{\hspace{0.17em}}\\ \text{\hspace{0.17em}}\text{ }\text{ }\text{ }+{\beta }_{3}temp_de{v}_{ij}+{\beta }_{4}supplier_{C}_{ij}+{\beta }_{5}supplier_{B}_{ij}+{b}_{i}\text{\hspace{0.17em}},\end{array}$`

where

• defectsij is the number of defects observed in the batch produced by factory i (where i = 1, 2, ..., 20) during batch j (where j = 1, 2, ..., 5).

• μij is the mean number of defects corresponding to factory i during batch j.

• supplier_Cij and supplier_Bij are dummy variables that indicate whether company `C` or `B`, respectively, supplied the process chemicals for the batch produced by factory i during batch j.

• bi ~ N(0,σb2) is a random-effects intercept for each factory i that accounts for factory-specific variation in quality.

Using Wilkinson notation, specify this model as:

```'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)'```

To account for the Poisson distribution of the response variable, when fitting the model using `fitglme`, specify the `'Distribution'` name-value pair argument as `'Poisson'`. By default, `fitglme` uses a log link function for response variables with a Poisson distribution.

### Display the Model

The output of the fitting function `fitglme` provides information about generalized linear mixed-effects model.

Using the `mfr` manufacturing experiment data, fit a model using `newprocess`, `time_dev`, `temp_dev`, and `supplier` as fixed-effects predictors. Specify the response distribution as Poisson, the link function as log, and the fit method as Laplace.

```load mfr glme = fitglme(mfr,... 'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',... 'Distribution','Poisson','Link','log','FitMethod','Laplace',... 'DummyVarCoding','effects')```
```glme = Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue '(Intercept)' 1.4689 0.15988 9.1875 94 9.8194e-15 'newprocess' -0.36766 0.17755 -2.0708 94 0.041122 'time_dev' -0.094521 0.82849 -0.11409 94 0.90941 'temp_dev' -0.28317 0.9617 -0.29444 94 0.76907 'supplier_C' -0.071868 0.078024 -0.9211 94 0.35936 'supplier_B' 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate '(Intercept)' '(Intercept)' 'std' 0.31381 Group: Error Name Estimate 'sqrt(Dispersion)' 1 ```

The `Model information` table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a `Poisson` distribution, the link function is `Log`, and the fit method is `Laplace`.

`Formula` indicates the model specification using Wilkinson’s notation.

The `Model fit statistics` table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (`AIC`), Bayesian information criterion (`BIC`) values, log likelihood (`LogLikelihood`), and deviance (`Deviance`) values.

The `Fixed effects coefficients` table indicates that `fitglme` returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (`Name`) contains the name of each fixed-effects coefficient, column 2 (`Estimate`) contains its estimated value, and column 3 (`SE`) contains the standard error of the coefficient. Column 4 (`tStat`) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (`DF`) and column 6 (`pValue`) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (`Lower` and `Upper`) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient.

`Random effects covariance parameters` displays a table for each grouping variable (here, only `factory`), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, `std` indicates that `fitglme` returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1.

The standard display generated by `fitglme` does not provide confidence intervals for the random-effects parameters. To compute and display these values, use `covarianceParameters`.

### Work with the Model

After you create a GLME model using `fitglme`, you can use additional functions to work with the model.

#### Inspect and Test Coefficients and Confidence Intervals

To extract estimates of the fixed- and random-effects coefficients, covariance parameters, design matrices, and related statistics:

• `fixedEffects` extracts estimated fixed-effects coefficients and related statistics from a fitted model. Related statistics include the standard error; the t-statistic, degrees of freedom, and p-value for a hypothesis test of whether each parameter is equal to 0; and the confidence intervals.

• `randomEffects` extracts estimated random-effects coefficients and related statistics from a fitted GLME model. Related statistics include the estimated empirical Bayes predictor (EBP) of each random effect, the square root of the conditional mean squared error of prediction (CMSEP) given the covariance parameters and the response; the t-statistic, estimated degrees of freedom, and p-value for a hypothesis test of whether each random effect is equal to 0; and the confidence intervals.

• `covarianceParameters` extracts estimated covariance parameters and related statistics from a fitted GLME model. Related statistics include estimate of the covariance parameter, and the confidence intervals.

• `designMatrix` extracts the fixed- and random-effects design matrices, or a specified subset thereof, from the fitted GLME model.

To conduct customized hypothesis tests for the significance of fixed- and random-effects coefficients, and to compute custom confidence intervals:

• `anova` performs a marginal F-test (hypothesis test) on fixed-effects terms, to determine if all coefficients representing the fixed-effects terms are equal to 0. You can use `anova` to test the combined significance of the coefficients of categorical predictors.

• `coefCI` computes confidence intervals for fixed- and random-effects parameters from a fitted GLME model. By default, `fitglme` computes 95% confidence intervals. Use `coefCI` to compute the boundaries at a different confidence level.

• `coefTest` performs custom hypothesis tests on fixed-effects or random-effects vectors of a fitted generalized linear mixed-effects model. For example, you can specify contrast matrices.

#### Generate New Response Values and Refit Model

To generate new response values, including fitted, predicted, and random responses, based on the fitted GLME model:

• `fitted` computes fitted response values using the original predictor values, and the estimated coefficient and parameter values from the fitted model.

• `predict` computes the predicted conditional or marginal mean of the response using either the original predictor values or new predictor values, and the estimated coefficient and parameter values from the fitted model.

• `random` generates random responses from a fitted model.

• `refit` creates a new fitted GLME model, based on the original model and a new response vector.

#### Inspect and Visualize Residuals

To extract and visualize residuals from the fitted GLME model:

• `residuals` extracts the raw or Pearson residuals from the fitted model. You can also specify whether to compute the conditional or marginal residuals.

• `plotResiduals` creates plots using the raw or Pearson residuals from the fitted model, including:

• A histogram of the residuals

• A scatterplot of the residuals versus fitted values

• A scatterplot of residuals versus lagged residuals