Statistical test in sequentialfs?
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Alexis Moscoso Rial
el 23 de Oct. de 2017
Comentada: Bibhavari Bandyopadhyay
el 22 de Jul. de 2019
Hello,
I'm using sequentialfs to find the most suitable subset of features for my binary classification problem. As described in sequentialfs documentation, this algorithm adds a feature if, when using 10-fold cross-validation, mean criterion (averaged over the 10-folds) is the smallest across candidate features and is smaller compared to the mean criterion yielded by the model without that feature. My question is the following: does sequentialfs use some kind of statistical test to compare criterions yielded by the model without the feature and the model with the feature or is it just a comparison of mean criterions (if mean criterion of n features > mean criterion of n+1 features, then add feature).
Thanks!
2 comentarios
Scott Weidenkopf
el 31 de Oct. de 2017
After computing the mean criterion values for each candidate feature subset, sequentialfs chooses the candidate feature subset that minimizes the mean criterion value. This process continues until adding more features does not decrease the criterion.
I am not sure I understand your question, what sort of statistical test are you referring to?
Respuestas (1)
Scott Weidenkopf
el 3 de Nov. de 2017
'sequentialfs' simply compares the mean criterion values of the candidate subsets after performing the cross-validation. Below is the algorithm described in the blog post, with the third step reworded.
- Start by testing each possible predictor one at a time. Identify the single predictor that generates the most accurate model. This predictor is automatically added to the model.
- Next, one at a time, add each of the remaining predictors to a model that includes the single best variable. Identify the variable that improves the accuracy of the model the most.
- Test the two models for predictive accuracy. If the new model is not more accurate that the original model within a specified tolerance, stop the process. If, however, the new model has better predictive accuracy, go and search for the third best variable.
- Repeat this process until you can't identify a new variable that improves the predictive accuracy of the model.
There is a 'significance test' of sorts being performed here, in the sense that the improvements in the model accuracy are measured against a tolerance. The tolerance can be specified in the 'TolFun' parameter to the 'options' struct which can be passed to 'sequentialfs'. This value defaults to 1e-6 or 0, depending on the direction of the sequential search.
2 comentarios
Bibhavari Bandyopadhyay
el 22 de Jul. de 2019
I am using sequentialfs function.I get an error in line 363 of sequentialfs.m file.It says incorrect assignment due to incorrect number of rows.Kindly help me in this matter asap.
Ver también
Categorías
Más información sobre Hypothesis Tests en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!