RegressionBaggedEnsemble

Regression ensemble grown by resampling

Description

RegressionBaggedEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners.

Creation

Description

Create a bagged regression ensemble object using fitrensemble. Set the name-value pair argument 'Method' of fitrensemble to 'Bag' to use bootstrap aggregation (bagging, for example, random forest).

For a description of bagged classification ensembles, see Bootstrap Aggregation (Bagging) and Random Forest.

Properties

expand all

`BinEdges` — Bin edges for numeric predictors
cell array of p numeric vectors

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

`CategoricalPredictors` — Indices of categorical predictors
vector of positive integers | `[]`

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

`CombineWeights` — How the ensemble combines weak learner weights
`'WeightedAverage'` | `'WeightedSum'`

This property is read-only.

How the ensemble combines weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

`ExpandedPredictorNames` — Expanded predictor names
cell array of character vectors

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

`FitInfo` — Fit information
numeric array

Fit information, returned as a numeric array. The FitInfoDescription property describes the content of this array.

Data Types: double

`FitInfoDescription` — Description of information in `FitInfo`
character vector | cell array of character vectors

Description of the information in FitInfo, returned as a character vector or cell array of character vectors.

Data Types: char | cell

`FResample` — Fraction of training data resampled during object construction
numeric scalar between `0` and `1`

Fraction of training data resampled during object construction, returned as a numeric scalar between 0 and 1. fitrensemble resamples the training data at random for every weak learner when constructing the ensemble.

Data Types: double

`HyperparameterOptimizationResults` — Description of cross-validation optimization of hyperparameters
`BayesianOptimization` object | table of hyperparameters and associated values

This property is read-only.

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation:

'bayesopt' (default) — Object of class BayesianOptimization
'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

`LearnerNames` — Names of weak learners in ensemble
cell array of character vectors

This property is read-only.

Names of weak learners in ensemble, returned as a cell array of character vectors. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Data Types: cell

`Method` — Method that creates ensemble
character vector

Method that fitrensemble uses to create the ensemble, returned as a character vector.

Data Types: char

`ModelParameters` — Parameters used in training ensemble
`EnsembleParams` object

Parameters used in training the ensemble, returned as an EnsembleParams object. The properties of ModelParameters include the type of ensemble, either 'classification' or 'regression', the Method used to create the ensemble, and other parameters, depending on the ensemble.

`NumObservations` — Number of observations in the training data
positive integer

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

`NumTrained` — Number of trained weak learners
positive integer

This property is read-only.

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

`PredictorNames` — Predictor names
cell array of character vectors

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

`ReasonForTermination` — Reason that `fitrensemble` stopped adding weak learners to the ensemble
character vector

This property is read-only.

Reason that fitrensemble stopped adding weak learners to the ensemble, returned as a character vector.

Data Types: char

`Regularization` — Result of using `regularize` on ensemble
structure

Result of using the regularize method on the ensemble, returned as a structure. Use Regularization with shrink to lower resubstitution error and shrink the ensemble.

Data Types: struct

`Replace` — Indication that ensemble was trained with replacement
`true` | `false`

Indication that the ensemble was trained with replacement, returned as true or false.

Data Types: logical

`ResponseName` — Name of the response variable
character vector

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

`ResponseTransform` — Function for transforming raw response values
`'none'` (default) | function handle | function name

Function for transforming raw response values, specified as a function handle or function name. The default is 'none', which means @(y)y, or no transformation. The function should accept a vector (the original response values) and return a vector of the same size (the transformed response values).

Example: Suppose you create a function handle that applies an exponential transformation to an input vector by using myfunction = @(y)exp(y). Then, you can specify the response transformation as 'ResponseTransform',myfunction.

Data Types: char | string | function_handle

`Trained` — Trained regression models
cell vector

Trained regression models, returned as a cell vector. The entries of the cell vector contain the corresponding compact regression models.

If Method is 'LogitBoost' or 'GentleBoost', then the ensemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner.

Data Types: cell

`TrainedWeights` — Trained weak learner weights
numeric vector

This property is read-only.

Trained weights for the weak learners in the ensemble, returned as a numeric vector. TrainedWeights has T elements, where T is the number of weak learners in learners. The ensemble computes predicted response by aggregating weighted predictions from its learners.

Data Types: double

`UseObsForLearner` — Indicator that observation was used to train learner
`N`-by-`NumTrained` real matrix

Indicator that observation was used to train learner, returned as a logical matrix of size N-by-NumTrained, where N is the number of rows of training data and NumTrained is the number of trained weak learners. UseObsForLearner(I,J) is true if observation I was used for training learner J, and is false otherwise.

Data Types: logical

`W` — Scaled weights in tree
numeric vector

This property is read-only.

Scaled weights in tree, returned as a numeric vector. W has length n, the number of rows in the training data.

Data Types: double

`X` — Predictor values
real matrix | table

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

`Y` — Row classifications
categorical array | cell array of character vectors | character array | logical vector | numeric vector

This property is read-only.

Row classifications corresponding to the rows of X, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X.

Object Functions

`compact`	Reduce size of regression ensemble model
`crossval`	Cross-validate machine learning model
`cvshrink`	Cross-validate pruning and regularization of regression ensemble
`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`lime`	Local interpretable model-agnostic explanations (LIME)
`loss`	Regression error for regression ensemble model
`oobLoss`	Out-of-bag error for bagged regression ensemble model
`oobPermutedPredictorImportance`	Out-of-bag predictor importance estimates for random forest of regression trees by permutation
`oobPredict`	Predict out-of-bag responses of bagged regression ensemble
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict responses using regression ensemble model
`predictorImportance`	Estimates of predictor importance for regression ensemble of decision trees
`regularize`	Find optimal weights for learners in regression ensemble
`removeLearners`	Remove members of compact regression ensemble
`resubLoss`	Resubstitution loss for regression ensemble model
`resubPredict`	Predict response of regression ensemble by resubstitution
`resume`	Resume training of regression ensemble model
`shapley`	Shapley values
`shrink`	Prune regression ensemble

Examples

collapse all

Train Bagged Ensemble of Regression Trees

Open Live Script

Load the carsmall data set. Consider a model that explains a car's fuel economy (MPG) using its weight (Weight) and number of cylinders (Cylinders).

load carsmall
X = [Weight Cylinders];
Y = MPG;

Train a bagged ensemble of 100 regression trees using all measurements.

Mdl = fitrensemble(X,Y,'Method','bag')

Mdl = 
  RegressionBaggedEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
        ResponseTransform: 'none'
          NumObservations: 94
               NumTrained: 100
                   Method: 'Bag'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: []
       FitInfoDescription: 'None'
           Regularization: []
                FResample: 1
                  Replace: 1
         UseObsForLearner: [94x100 logical]

Mdl is a RegressionBaggedEnsemble model object.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained, compact regression trees (CompactRegressionTree model objects) that compose the ensemble.

Plot a graph of the first trained regression tree.

view(Mdl.Trained{1},'Mode','graph')

By default, fitrensemble grows deep trees for bags of trees.

Estimate the in-sample mean-squared error (MSE).

L = resubLoss(Mdl)

L = 12.4048

Tips

For a bagged ensemble of regression trees, the Trained property of ens stores a cell vector of ens.NumTrained CompactRegressionTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(ens.Trained{t})

Alternative Functionality

Bootstrap Aggregation Methods

For classification or regression, you can choose two approaches for bagging:

Classification: create a bagged ensemble using fitcensemble or TreeBagger.
Regression: create a bagged ensemble using fitrensemble or TreeBagger.

For help choosing between these approaches, see Ensemble Algorithms and Suggestions for Choosing an Appropriate Ensemble Algorithm.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict function supports code generation.
To integrate the prediction of an ensemble into Simulink^®, you can use the RegressionEnsemble Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB^® Function block with the predict function.
When you train an ensemble by using fitrensemble, the following restrictions apply.
- The value of the ResponseTransform name-value argument cannot be an anonymous function.
- Code generation limitations for regression trees also apply to ensembles of regression trees. You cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'.
For fixed-point code generation, the following additional restrictions apply.
- When you train an ensemble by using fitrensemble, the value of the ResponseTransform name-value argument must be 'none' (default).
- Categorical predictors (logical, categorical, char, string, or cell) are not supported. You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model.

For more information, see Introduction to Code Generation.

Version History

Introduced in R2011a

RegressionBaggedEnsemble

Description

Creation

Description

Properties

BinEdges — Bin edges for numeric predictors cell array of p numeric vectors

CategoricalPredictors — Indices of categorical predictors vector of positive integers | []

CombineWeights — How the ensemble combines weak learner weights 'WeightedAverage' | 'WeightedSum'

ExpandedPredictorNames — Expanded predictor names cell array of character vectors

FitInfo — Fit information numeric array

FitInfoDescription — Description of information in FitInfo character vector | cell array of character vectors

FResample — Fraction of training data resampled during object construction numeric scalar between 0 and 1

HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table of hyperparameters and associated values

LearnerNames — Names of weak learners in ensemble cell array of character vectors

Method — Method that creates ensemble character vector

ModelParameters — Parameters used in training ensemble EnsembleParams object

NumObservations — Number of observations in the training data positive integer

NumTrained — Number of trained weak learners positive integer

PredictorNames — Predictor names cell array of character vectors

ReasonForTermination — Reason that fitrensemble stopped adding weak learners to the ensemble character vector

Regularization — Result of using regularize on ensemble structure

Replace — Indication that ensemble was trained with replacement true | false

ResponseName — Name of the response variable character vector

ResponseTransform — Function for transforming raw response values 'none' (default) | function handle | function name

Trained — Trained regression models cell vector

TrainedWeights — Trained weak learner weights numeric vector

UseObsForLearner — Indicator that observation was used to train learner N-by-NumTrained real matrix

W — Scaled weights in tree numeric vector

X — Predictor values real matrix | table

Y — Row classifications categorical array | cell array of character vectors | character array | logical vector | numeric vector

Object Functions

Examples

Train Bagged Ensemble of Regression Trees

Tips

Alternative Functionality

Bootstrap Aggregation Methods

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

See Also

Topics

`BinEdges` — Bin edges for numeric predictors
cell array of p numeric vectors

`CategoricalPredictors` — Indices of categorical predictors
vector of positive integers | `[]`

`CombineWeights` — How the ensemble combines weak learner weights
`'WeightedAverage'` | `'WeightedSum'`

`ExpandedPredictorNames` — Expanded predictor names
cell array of character vectors

`FitInfo` — Fit information
numeric array

`FitInfoDescription` — Description of information in `FitInfo`
character vector | cell array of character vectors

`FResample` — Fraction of training data resampled during object construction
numeric scalar between `0` and `1`

`HyperparameterOptimizationResults` — Description of cross-validation optimization of hyperparameters
`BayesianOptimization` object | table of hyperparameters and associated values

`LearnerNames` — Names of weak learners in ensemble
cell array of character vectors

`Method` — Method that creates ensemble
character vector

`ModelParameters` — Parameters used in training ensemble
`EnsembleParams` object

`NumObservations` — Number of observations in the training data
positive integer

`NumTrained` — Number of trained weak learners
positive integer

`PredictorNames` — Predictor names
cell array of character vectors

`ReasonForTermination` — Reason that `fitrensemble` stopped adding weak learners to the ensemble
character vector

`Regularization` — Result of using `regularize` on ensemble
structure

`Replace` — Indication that ensemble was trained with replacement
`true` | `false`

`ResponseName` — Name of the response variable
character vector

`ResponseTransform` — Function for transforming raw response values
`'none'` (default) | function handle | function name

`Trained` — Trained regression models
cell vector

`TrainedWeights` — Trained weak learner weights
numeric vector

`UseObsForLearner` — Indicator that observation was used to train learner
`N`-by-`NumTrained` real matrix

`W` — Scaled weights in tree
numeric vector

`X` — Predictor values
real matrix | table

`Y` — Row classifications
categorical array | cell array of character vectors | character array | logical vector | numeric vector

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.