MATLAB Answers

0

How to do feature selection by maximizing Rsquared for linear regression model ?

Latest activity Commented on by the cyclist
on 11 Jun 2019
Hi everyone,
I am doing project to build predictive model but before that I just want the important features for the model. So I am using feature selection.
I went through this link and code is working properly https://www.mathworks.com/help/stats/feature-selection.html.
But I want to use Rsquared instead of Deviance which is used in the above link, that is I want to select those features that give good Rsquared value(>0.85) .
Can anyone help me out with the code , thanks !

  1 Comment

Also, I should mention that if your full model (i.e. with all features) does not achieve R^2 > 0.85, then a reduced feature set cannot achieve that. Is that what you were hoping for?

Sign in to comment.

1 Answer

Answer by the cyclist
on 10 Jun 2019
Edited by the cyclist
on 10 Jun 2019

I believe you just need to redefine the critfun function from the one in the example:
function dev = critfun(X,Y)
model = fitglm(X,Y,'Distribution','binomial');
dev = model.Deviance;
end
replacing the critical value with
dev = model.RSquared
You might want to rename that variable something like rsqr, to avoid confusion.
EDIT:
After reading that example, and thinking about it a bit more, there might be some other nuances. That example states, "Adding a feature with no effect reduces the deviance by an amount that has a chi-square distribution with one degree of freedom". I'm not sure the same is true for R^2. So, that might bear some thought.
Also, I believe the deviance measure is something that is minimized, whereas R^2 is maximized. There is probably an adjustment that needs to be made for that as well. (One simple possibility would be to return 1-R^2 in the critical function, I guess.)

  4 Comments

Show 1 older comment
Well, right off the bat I can say that
function dev = critfun(input,N)
model = fitlm(input,N);
R= model.Rsquared.Ordinary;
end
is not going to work, because you have not defined the output variable dev. You have only defined the variable R.
Was just a copy paste error , but had run with correct variable and still could not get the code to select features.
I have to admit that I have not tried to deeply understand the example. But it seems to me that you still need to deal with the fact that you want to maximize R^2, not minimize it.
Also, I think you have not fully understood the purpose of the lines
maxR=chi2inv(0.4,1);
...
'TolFun',maxdev,...
(where I assume the mismatch here is another typo).
That line is not about defining the absolute level of R^2 that defines the stopping criterion. It is about the relative level, compared to the prior models with fewer feature (I think).
All in all, my impression is that you are trying to make these changes without getting a deeper understanding of what everything is doing, which is hazardous to getting the correct result.

Sign in to comment.