Why is fitlm affected by variable scale?

13 visualizaciones (últimos 30 días)
Harold Matthews
Harold Matthews el 1 de Dic. de 2021
Editada: Devendra el 13 de Abr. de 2024 a las 4:58
Dear all,
My statistics is pretty solid and my understanding is that if you fit a linear regression the scale of the X and Y variables should not affect the resulting p-values. I am running fitlm on some data (see demo and data attached) and changing the scale of the variables by transfiorming them to z-scores has a profound effect on the resulting p values. In the attached (Demo.m) code I fit two models with the same model design on the same data (in the attached 'Data.mat' file). The only difference is that for model 1 the X and Y variables are normalised to z scores and in model 2 they are not. I then scatter the p-values. You can see in the upper left corner that two p values that were not significant for model 1 become signfiocant for model 2.
Sorry I cannot get the demo code embedded in this question, so I have attached it. If anyone has any insights into this that would be great :)
  1 comentario
Devendra
Devendra el 13 de Abr. de 2024 a las 4:57
Thank you very much for detailed explanation. I am getting wierd results of fitlm function used in my matlab code. I am attaching the code and input data file and request you to kindly have a look on code and suggest me how to get the correct results.
I would appreciate your kind cooperation.
Deva

Iniciar sesión para comentar.

Respuesta aceptada

Ive J
Ive J el 1 de Dic. de 2021
Well, the real question would be why not?
You have introduced interaction terms to the model. Two models test different hypotheses (except for the interaction terms). You can find a good explanation here. Clearly, when you remove the interaction terms, all t-stats would be the same for both models.
  1 comentario
Devendra
Devendra el 13 de Abr. de 2024 a las 4:54
Editada: Devendra el 13 de Abr. de 2024 a las 4:58
thanks for valuable information.

Iniciar sesión para comentar.

Más respuestas (1)

Jeff Miller
Jeff Miller el 1 de Dic. de 2021
Your understanding is correct for linear regression but your model is nonlinear because of the interaction terms. Consider:
zX = zscore(X);
corr(X(:,1),zX(:,1))
ans =
1
corr(X(:,1).*X(:,2),zX(:,1).*zX(:,2))
ans =
0.2421

Productos


Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by