devianceTest

Analysis of deviance for generalized linear regression model

Description

example

tbl = devianceTest(mdl) returns an analysis of deviance table for the generalized linear regression model mdl. The table tbl gives the result of a test that determines whether the model mdl fits significantly better than a constant model.

Examples

collapse all

Perform a deviance test on a generalized linear regression model.

Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2).

rng('default') % For reproducibility
rndvars = randn(100,2);
X = [2 + rndvars(:,1),rndvars(:,2)];
mu = exp(1 + X*[1;2]);
y = poissrnd(mu);

Create a generalized linear regression model of Poisson data.

mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson')
mdl =
Generalized linear regression model:
log(y) ~ 1 + x1 + x2
Distribution = Poisson

Estimated Coefficients:
Estimate       SE        tStat     pValue
________    _________    ______    ______

(Intercept)     1.0405      0.022122    47.034      0
x1              0.9968      0.003362    296.49      0
x2               1.987     0.0063433    313.24      0

100 observations, 97 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0

Test whether the model differs from a constant in a statistically significant way.

tbl = devianceTest(mdl)
tbl=2×4 table
Deviance     DFE     chi2Stat     pValue
__________    ___    __________    ______

log(y) ~ 1              2.9544e+05    99
log(y) ~ 1 + x1 + x2         107.4    97     2.9533e+05       0

The small p-value indicates that the model significantly differs from a constant. Note that the model display of mdl includes the statistics shown in the second row of the table.

Input Arguments

collapse all

Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm, or a CompactGeneralizedLinearModel object created using compact.

Output Arguments

collapse all

Analysis of deviance summary statistics, returned as a table.

tbl contains analysis of deviance statistics for both a constant model and the model mdl. The table includes these columns for each model.

ColumnDescription
Deviance

Deviance is twice the difference between the loglikelihoods of the corresponding model (mdl or constant) and the saturated model. For more information, see Deviance.

DFE

Degrees of freedom for the error (residuals), equal to np, where n is the number of observations, and p is the number of estimated coefficients

chi2Stat

F-statistic or chi-squared statistic, depending on whether the dispersion is estimated (F-statistic) or not (chi-squared statistic)

• F-statistic is the difference between the deviance of the constant model and the deviance of the full model, divided by the estimated dispersion.

• Chi-squared statistic is the difference between the deviance of the constant model and the deviance of the full model.

pValue

p-value associated with the test: chi-squared statistic with p – 1 degrees of freedom, or F-statistic with p – 1 numerator degrees of freedom and DFE denominator degrees of freedom, where p is the number of estimated coefficients

collapse all

Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to a saturated model.

Deviance of a model M1 is twice the difference between the loglikelihood of the model M1 and the saturated model Ms. A saturated model is a model with the maximum number of parameters that you can estimate.

For example, if you have n observations (yi, i = 1, 2, ..., n) with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model with the parameters b. Then the deviance of the model M1 is

$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$

where b1 and bs contain the estimated parameters for the model M1 and the saturated model, respectively. The deviance has a chi-square distribution with np degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in the model M1.

Assume you have two different generalized linear regression models M1 and M2, and M1 has a subset of the terms in M2. You can assess the fit of the models by comparing the deviances D1 and D2 of the two models. The difference of the deviances is

$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$

Asymptotically, the difference D has a chi-square distribution with degrees of freedom v equal to the difference in the number of parameters estimated in M1 and M2. You can obtain the p-value for this test by using 1 – chi2cdf(D,v).

Typically, you examine D using a model M2 with a constant term and no predictors. Therefore, D has a chi-square distribution with p – 1 degrees of freedom. If the dispersion is estimated, the difference divided by the estimated dispersion has an F distribution with p – 1 numerator degrees of freedom and np denominator degrees of freedom.