Using a GLM model to predict the response factor from a fixed factor

7 views (last 30 days)
I want to use a generalized linear model, with the fraction of the people going to a pub(first column) as the response and the discount level (second column) as a fixed factor, to see if the model is significantly different than a null model.
g = fitglm(a(:,1),[a(:,2),a(:,3)],'linear','distr','binomial','link','probit')
I got the following results from the analyis;
Generalized linear regression model:
probit(y) ~ 1 + x1
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue
________ _________ _______ ________
(Intercept) -1.4245 0.64976 -2.1923 0.028356
x1 0.00651 0.0071339 0.91255 0.36148
35 observations, 33 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 0.869, p-value = 0.351
Does this mean that there is a significant difference between the null model and the GLM?, And the 'fraction of people going to the pub' can be predicted using a linear model with varying access levels? Why there is a large degree of freedom, eventhough i only have 4 'discount levels' ?Data is provided for a reference.
Any help will be appreciated..

Sign in to comment.

Accepted Answer

the cyclist
the cyclist on 15 Oct 2021
I'm very confused about how you have coded the model. Specifically,
  • The first input to fitglm should be the predictor variable, and the second input should be the response. You seem to have done the opposite.
  • You don't mention wanting to include 'Pub name' at all, but you have it in the model.
  • I'm not sure why you chose the particular linking function you did, which is not sensible to me
There does seem to be a relationship between your data, as seen by plotting it. (Always plot your data!)
a = xlsread('Pub_fraction.xlsx');
Here is an ordinary linear regression, which shows a statistically signficant relationship. Perhaps a different linking functoin makes more sense. I did not think carefully about it.
g = fitglm(a(:,2),a(:,1))
g =
Generalized linear regression model: y ~ 1 + x1 Distribution = Normal Estimated Coefficients: Estimate SE tStat pValue ________ ______ ______ __________ (Intercept) 26.335 3.2889 8.0073 3.08e-09 x1 83.27 4.7896 17.385 3.5266e-18 35 observations, 33 error degrees of freedom Estimated Dispersion: 79.7 F-statistic vs. constant model: 302, p-value = 3.53e-18
the cyclist
the cyclist on 16 Oct 2021
I'm still confused.
You have made the following two statements:
  • fraction of the people going to a pub(first column) as the response
  • the response variable is in the second column
But in your file, "fraction of people going to pub" is the first column.
Also, you say your chose the binomial function because there are two options, "take the discount or not". But, that is NOT what you have in your data. "Take the discount or not" seems like it should be TRUE/FALSE or 1/0. But your variable seems to be a probability (25,50,75,100 percent?). You will not be able to fit a logistic model to these data.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by