Main Content

coefTest

Linear hypothesis test on multinomial regression model coefficients

Since R2023a

    Description

    p = coefTest(mdl) computes the p-value for an F-test that all coefficient estimates in mdl are zero.

    example

    p = coefTest(mdl,H) performs an F-test that H × B = 0, where B represents the coefficient vector. Use H to specify the coefficients to include in the F-test.

    p = coefTest(mdl,H,C) performs an F-test that H × B = C.

    [p,F] = coefTest(___) also returns the F-test statistic F using any of the input argument combinations in previous syntaxes.

    [p,F,r] = coefTest(___) also returns the numerator degrees of freedom r for the test.

    example

    Examples

    collapse all

    Load the fisheriris data set.

    load fisheriris

    The column vector species contains iris flowers of three different species: setosa, versicolor, and virginica. The matrix meas contains four types of measurements for the flowers: the length and width of sepals and petals in centimeters.

    Create a table from the iris measurements and species data by using the array2table function.

    tbl = array2table(meas,...
        VariableNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]);
    tbl.Species = species;

    Fit a multinomial regression model using the petal measurements as the predictor data and the species as the response data.

    mdl = fitmnr(tbl,"Species ~ PetalLength + PetalWidth^2")
    mdl = 
    Multinomial regression with nominal responses
    
                                    Value       SE       tStat       pValue  
                                   _______    ______    _______    __________
    
        (Intercept_setosa)           136.9    12.587     10.876    1.4933e-27
        PetalLength_setosa         -17.351    7.0021     -2.478      0.013211
        PetalWidth_setosa          -77.383     24.06    -3.2163     0.0012987
        PetalWidth^2_setosa        -24.719    8.3324    -2.9666     0.0030111
        (Intercept_versicolor)      8.2731    14.489      0.571         0.568
        PetalLength_versicolor     -5.7089    2.0638    -2.7662     0.0056709
        PetalWidth_versicolor       35.208     21.97     1.6026       0.10903
        PetalWidth^2_versicolor    -14.041    7.1653    -1.9596      0.050037
    
    
    150 observations, 292 error degrees of freedom
    Dispersion: 1
    Chi^2-statistic vs. constant model: 309.3988, p-value = 7.9151e-64
    

    mdl is a multinomial regression model object that contains the results of the fitting a nominal multinomial regression model to the data. The chi-squared statistic and p-value correspond to the null hypothesis that the fitted model does not outperform a degenerate model consisting of only an intercept term. The large p-value indicates that not enough evidence exists to reject the null hypothesis.

    Perform an F-test to test the null hypothesis that all coefficients, except the intercept term, are zero. Use the default 95% significance level.

    p = coefTest(mdl)
    p = 
    3.5512e-133
    

    The small p-value in the output indicates that enough evidence exists to reject the null hypothesis that all coefficients are zero. Enough evidence exists to conclude that at least one of the fitted model coefficients is statistically significant at the 95% significance level.

    Load the carsmall data set.

    load carsmall

    The variables Acceleration, Weight, and Model_Year contain data for car acceleration, weight, and model year, respectively. The variable MPG contains car mileage data in miles per gallon (MPG).

    Sort the data in MPG into four response categories by using the discretize function.

    MPG = discretize(MPG,[9 19 29 39 48]);
    tbl = table(MPG,Acceleration,Weight,Model_Year);

    Fit a multinomial regression model of the car mileage as a function of the acceleration, weight, and model year.

    mdl = fitmnr(tbl,"MPG ~ Acceleration + Model_Year + Weight",CategoricalPredictors="Model_Year")
    mdl = 
    Multinomial regression with nominal responses
    
                            Value         SE         tStat       pValue   
                           ________    _________    _______    ___________
    
        (Intercept_1)        154.38       15.697      9.835     7.9576e-23
        Acceleration_1       -11.31      0.53323     -21.21    7.7405e-100
        Weight_1           0.098347    0.0034745     28.306    2.9244e-176
        Model_Year_76_1      182.33       4.5868      39.75              0
        Model_Year_82_1     -1690.4       4.6231    -365.64              0
        (Intercept_2)        177.87       14.211     12.516     6.0891e-36
        Acceleration_2       -11.28      0.48884    -23.076    8.1522e-118
        Weight_2           0.090009    0.0030349     29.658    2.6661e-193
        Model_Year_76_2      187.19       4.2373     44.176              0
        Model_Year_82_2      -136.5       3.4781    -39.244              0
        (Intercept_3)        103.66       14.991     6.9146     4.6928e-12
        Acceleration_3      -11.359      0.48805    -23.274    8.2157e-120
        Weight_3           0.080071    0.0033652     23.794    3.8879e-125
        Model_Year_76_3      283.31       4.7309     59.885              0
        Model_Year_82_3     -34.727       4.0878    -8.4953     1.9743e-17
    
    
    94 observations, 267 error degrees of freedom
    Dispersion: 1
    Chi^2-statistic vs. constant model: 169.6193, p-value = 5.7114e-30
    

    mdl is a multinomial regression model object that contains the results of fitting a nominal multinomial regression model to the data. By default, the fourth response category is the reference category. Each row of the table output corresponds to the coefficient of the model term in the first column. The tStat and pValue columns contain the t-statistics and p-values, respectively, for the null hypothesis that the corresponding coefficient is zero. The small p-values for the Model_Year terms indicate that the model year has a statistically significant effect on mdl. For example, the p-value for the term Model_Year_76_2 indicates that a car being manufactured in 1976 has a statistically significant effect on ln(π2π4), where πi is the ith category probability.

    You can use a numeric index matrix to investigate whether a group of coefficients contains a coefficient that is statistically significant. Use a numeric index matrix to test the null hypothesis that all coefficients corresponding to the Model_Year terms are zero.

    idx_Model_Year = [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0;...
                      0 0 0 0 1 0 0 0 0 0 0 0 0 0 0;...
                      0 0 0 0 0 0 0 0 1 0 0 0 0 0 0;...
                      0 0 0 0 0 0 0 0 0 1 0 0 0 0 0;...
                      0 0 0 0 0 0 0 0 0 0 0 0 0 1 0;...
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 1;...
    ];
    
    [p_Model_Year,F_Model_Year,r_Model_Year] = coefTest(mdl,idx_Model_Year)
    p_Model_Year = 
    0
    
    F_Model_Year = 
    4.8985e+04
    
    r_Model_Year = 
    6
    

    The returned p-value indicates that at least one of the category coefficients corresponding to Model_Year is statistically different from zero. This result is consistent with the small p-value for each of the Model_Term coefficients.

    Input Arguments

    collapse all

    Multinomial regression model object, specified as a MultinomialRegression model object created with the fitmnr function.

    Hypothesis matrix, specified as a full-rank numeric index matrix of size r-by-s, where r is the number of linear combinations of coefficients being tested, and s is the total number of coefficients.

    • If you specify H, then the output p is the p-value for an F-test that H × B = 0, where B represents the coefficient vector.

    • If you specify H and C, then the output p is the p-value for an F-test that H × B = C.

    Example: [1 0 0 0 0] tests the first coefficient among five coefficients.

    Data Types: single | double | logical

    Hypothesized value for testing the null hypothesis, specified as a numeric vector with the same number of rows as H.

    If you specify H and C, then the output p is the p-value for an F-test that H × B = C, where B represents the coefficient vector.

    Data Types: single | double

    Output Arguments

    collapse all

    p-value for the F-test, returned as a numeric value in the range [0,1].

    Value of the test statistic for the F-test, returned as a numeric value.

    Numerator degrees of freedom for the F-test, returned as a positive integer. The F-statistic has r degrees of freedom in the numerator and mdl.DFE degrees of freedom in the denominator.

    Version History

    Introduced in R2023a