Identify influential features to improve model performance

Feature selection is a dimensionality reduction technique that selects only a subset of measured features (predictor variables) that provide the best predictive power in modeling the data. It is particularly useful when dealing with very high-dimensional data or when modeling with all features is undesirable.

Feature selection can be used to:

  • Improve the accuracy of a machine learning algorithm
  • Boost the performance on very high-dimensional data
  • Improve model interpretability
  • Prevent overfitting

There are several common approaches to feature selection:

  • Stepwise regression sequentially adds or removes features until there is no improvement in prediction; used with linear regression or generalized linear regression algorithms. Similarly, sequential feature selection for any supervised learning sequentially builds up a feature set algorithm until accuracy (or a custom performance measure) stop improving.
  • Automated feature selection such as neighborhood component analysis (NCA) identifies a subset of features that maximize classification performance based on their predictive power.
  • Boosted and bagged decision trees are ensemble methods that compute variable importance from out-of-bag estimates.
  • Regularization (lasso and elastic nets) is a shrinkage estimator used to remove redundant features by reducing their weights (coefficients) to zero.

Another dimensionality reduction approach is to use feature extraction or feature transformation techniques, which transform existing features into new features (predictor variables) with the less descriptive features dropped.

Approaches to feature transformation include:

For more information on feature selection, including machine learning, regression, and transformation, see Statistics and Machine Learning Toolbox™ for use with MATLAB®.

See also: Statistics and Machine Learning Toolbox, AdaBoost, machine learning, linear model, regularization, AutoML

Mastering Machine Learning: A Step-by-Step Guide with MATLAB