Using a GLM model to predict the response factor from a fixed factor

Question

Hari krishnan el 13 de Oct. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1562981-using-a-glm-model-to-predict-the-response-factor-from-a-fixed-factor

Comentada: the cyclist el 16 de Oct. de 2021

Pub_fraction.xlsx

I want to use a generalized linear model, with the fraction of the people going to a pub(first column) as the response and the discount level (second column) as a fixed factor, to see if the model is significantly different than a null model.

g = fitglm(a(:,1),[a(:,2),a(:,3)],'linear','distr','binomial','link','probit')

I got the following results from the analyis;

Generalized linear regression model:
    probit(y) ~ 1 + x1
    Distribution = Binomial
Estimated Coefficients:
                   Estimate       SE         tStat      pValue 
                   ________    _________    _______    ________
    (Intercept)    -1.4245       0.64976    -2.1923    0.028356
    x1             0.00651     0.0071339    0.91255     0.36148
35 observations, 33 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 0.869, p-value = 0.351

Does this mean that there is a significant difference between the null model and the GLM?, And the 'fraction of people going to the pub' can be predicted using a linear model with varying access levels? Why there is a large degree of freedom, eventhough i only have 4 'discount levels' ?Data is provided for a reference.

Any help will be appreciated..

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Jeff Miller el 14 de Oct. de 2021

It is difficult to understand what you are asking.

The "fraction of people" column in the Excel file has only 4 different values (also, it is strange that the value in this column is always 25*Pub_name).
You say there are only 4 discount levels but there are many more than four values in the discount levels column of the excel file.
The fitglm command references 3 columns of 'a' but your description only mentions 2 variables.
It is not clear how 'a' relates to the excel file. Maybe the column labels are wrong in the Excel file?

If you really only have four discount levels, maybe the simplest thing to do is to compute and compare the average fraction at each discount level?

Hari krishnan el 15 de Oct. de 2021

Yes...

I can perform that..

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

the cyclist el 15 de Oct. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1562981-using-a-glm-model-to-predict-the-response-factor-from-a-fixed-factor#answer_809036

Abrir en MATLAB Online

Pub_fraction.xlsx

I'm very confused about how you have coded the model. Specifically,

The first input to fitglm should be the predictor variable, and the second input should be the response. You seem to have done the opposite.
You don't mention wanting to include 'Pub name' at all, but you have it in the model.
I'm not sure why you chose the particular linking function you did, which is not sensible to me

There does seem to be a relationship between your data, as seen by plotting it. (Always plot your data!)

a = xlsread('Pub_fraction.xlsx');

figure

scatter(a(:,2),a(:,1))

Here is an ordinary linear regression, which shows a statistically signficant relationship. Perhaps a different linking functoin makes more sense. I did not think carefully about it.

g = fitglm(a(:,2),a(:,1))
g = 
Generalized linear regression model:
    y ~ 1 + x1
    Distribution = Normal

Estimated Coefficients:
                   Estimate      SE      tStat       pValue  
                   ________    ______    ______    __________

    (Intercept)     26.335     3.2889    8.0073      3.08e-09
    x1               83.27     4.7896    17.385    3.5266e-18


35 observations, 33 error degrees of freedom
Estimated Dispersion: 79.7
F-statistic vs. constant model: 302, p-value = 3.53e-18

2 comentarios
Mostrar NingunoOcultar Ninguno

Hari krishnan el 15 de Oct. de 2021

Thank you for your answer.

The predictor variable in is the first column and the response variable is in the second column.

I choose the binomial function, because there are only two possible options, either to take the discount or not.

the cyclist el 16 de Oct. de 2021

I'm still confused.

You have made the following two statements:

fraction of the people going to a pub(first column) as the response
the response variable is in the second column

But in your file, "fraction of people going to pub" is the first column.

Also, you say your chose the binomial function because there are two options, "take the discount or not". But, that is NOT what you have in your data. "Take the discount or not" seems like it should be TRUE/FALSE or 1/0. But your variable seems to be a probability (25,50,75,100 percent?). You will not be able to fit a logistic model to these data.

Iniciar sesión para comentar.

Using a GLM model to predict the response factor from a fixed factor

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Using a GLM model to predict the response factor from a fixed factor

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

2 comentarios
Mostrar NingunoOcultar Ninguno