describe
Description
describe(
prints the description
of the features generated by Transformer
)Transformer
. Create the
FeatureTransformer
object Transformer
by using the
gencfeatures
or
genrfeatures
function.
describe(
prints the description of the features identified by Transformer
,Index
)Index
.
Examples
Generate and Inspect Features for Classification Problem
Generate features from a table of predictor data by using gencfeatures
. Inspect the generated features by using the describe
object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl)
Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' } {'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' } {'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' } {'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'} {'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime
and RestorationTime
, have data types that are not supported by classifier training functions like fitcensemble
.
Generate 25 features from the predictors in Tbl
that can be used to train a bagged ensemble. Specify the Region
table variable as the response.
Transformer = gencfeatures(Tbl,"Region",25,TargetLearner="bag")
Transformer = FeatureTransformer with properties: Type: 'classification' TargetLearner: 'bag' NumEngineeredFeatures: 22 NumOriginalFeatures: 3 TotalNumFeatures: 25
The Transformer
object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe
object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ _________________________________________________________________________________________________________________
Loss Numeric true Loss ""
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
sdn(OutageTime) Numeric false OutageTime "Serial date number from 01-Feb-2002 12:18:00"
woe3(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = SouthEast)"
doy(OutageTime) Numeric false OutageTime "Day of the year"
year(OutageTime) Numeric false OutageTime "Year"
kmd1 Numeric false Loss, Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd5 Numeric false Loss, Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
quarter(OutageTime) Numeric false OutageTime "Quarter of the year"
woe2(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = NorthEast)"
year(RestorationTime) Numeric false RestorationTime "Year"
month(OutageTime) Numeric false OutageTime "Month of the year"
Loss.*Customers Numeric false Loss, Customers "Loss .* Customers"
tods(OutageTime) Numeric false OutageTime "Time of the day in seconds"
⋮
The Info
table indicates the following:
The first three generated features are original to
Tbl
, although the software converts the originalCause
variable to a categorical variablec(Cause)
.The
OutageTime
andRestorationTime
variables are not included as generated features because they aredatetime
variables, which cannot be used to train a bagged ensemble model. However, the software derives many of the generated features from these variables, such as the fourth featureRestorationTime-OutageTime
.Some generated features are a combination of multiple transformations. For example, the software generates the sixth feature
woe3(c(Cause))
by converting theCause
variable to a categorical variable and then calculating the Weight of Evidence values for the resulting variable.
Generate and Inspect Features for Regression Problem
Generate features from a table of predictor data by using genrfeatures
. Inspect the generated features by using the describe
object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl)
Region OutageTime Loss Customers RestorationTime Cause _____________ ________________ ______ __________ ________________ ___________________ {'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' } {'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' } {'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'} {'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' } {'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' } {'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' } {'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'} {'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime
and RestorationTime
, have data types that are not supported by regression model training functions like fitrensemble
.
Generate 25 features from the predictors in Tbl
that can be used to train a bagged ensemble. Specify the Loss
table variable as the response.
rng("default") % For reproducibility Transformer = genrfeatures(Tbl,"Loss",25,TargetLearner="bag")
Transformer = FeatureTransformer with properties: Type: 'regression' TargetLearner: 'bag' NumEngineeredFeatures: 22 NumOriginalFeatures: 3 TotalNumFeatures: 25
The Transformer
object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe
object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ ___________________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
kmd2 Numeric false Customers "Euclidean distance to centroid 2 (kmeans clustering with k = 10)"
kmd1 Numeric false Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd4 Numeric false Customers "Euclidean distance to centroid 4 (kmeans clustering with k = 10)"
kmd5 Numeric false Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
kmd9 Numeric false Customers "Euclidean distance to centroid 9 (kmeans clustering with k = 10)"
cos(Customers) Numeric false Customers "cos( )"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
kmd6 Numeric false Customers "Euclidean distance to centroid 6 (kmeans clustering with k = 10)"
kmi Categorical false Customers "Cluster index encoding (kmeans clustering with k = 10)"
kmd7 Numeric false Customers "Euclidean distance to centroid 7 (kmeans clustering with k = 10)"
kmd3 Numeric false Customers "Euclidean distance to centroid 3 (kmeans clustering with k = 10)"
kmd10 Numeric false Customers "Euclidean distance to centroid 10 (kmeans clustering with k = 10)"
hour(RestorationTime) Numeric false RestorationTime "Hour of the day"
⋮
The first three generated features are original to Tbl
, although the software converts the original Region
and Cause
variables to categorical
variables.
Info(1:3,:) % describe(Transformer,1:3)
ans=3×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ______________ ______________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
The OutageTime
and RestorationTime
variables are not included as generated features because they are datetime
variables, which cannot be used to train a bagged ensemble model. However, the software derives some generated features from these variables, such as the tenth feature RestorationTime-OutageTime
.
Info(10,:) % describe(Transformer,10)
ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ___________________________ ________________________________________________________________
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
Some generated features are a combination of multiple transformations. For example, the software generates the nineteenth feature fenc(c(Cause))
by converting the Cause
variable to a categorical variable with 10 categories and then calculating the frequency of the categories.
Info(19,:) % describe(Transformer,19)
ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ______________ ____________________________________________________________________________________________________________
fenc(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Frequency encoding (number of levels = 10)"
Input Arguments
Transformer
— Feature transformer
FeatureTransformer
object
Feature transformer, specified as a FeatureTransformer
object.
Index
— Features to describe
numeric vector | logical vector | string array | cell array of character vectors
Features to describe, specified as a numeric or logical vector indicating the position of the features, or a string array or cell array of character vectors indicating the names of the features.
Example: 1:12
Data Types: single
| double
| logical
| string
| cell
Output Arguments
Info
— Feature descriptions
table
Feature descriptions, returned as a table. Each row corresponds to a generated feature, and each column provides the following information.
Column Name | Description |
---|---|
Type | Indicates the data type of the feature, either numeric
or categorical
|
IsOriginal | Indicates whether the feature is an original feature
(true ) or an engineered feature
(false ) |
InputVariables | Indicates the original features used to generate the feature |
Transformations | Describes the transformations used to generate the feature, in the order they are applied — For more information, see Feature Transformations. |
Algorithms
Feature Transformations
This table provides additional information on some of the more complex feature
transformation descriptions in Info.Transformations
.
Sample Feature Name | Sample Transformation Description in Info | Additional Information |
---|---|---|
eb4(Variable) | Equal-width binning (number of bins = 4) | The software splits the Variable values into
4 bins of equal width. The resulting feature is a categorical
variable. |
fenc(Variable) | Frequency encoding (number of levels = 10) | The software calculates the frequency of the 10 categories
(or levels) in Variable . In the resulting feature, the software
replaces each categorical value with the corresponding category frequency,
creating a numeric variable. |
kmc1 | Centroid encoding (component #1) (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature corresponds to an observation and is the 1 st component
of the cluster centroid associated with that observation. The resulting feature is
a numeric variable. |
kmd4 | Euclidean distance to centroid 4 (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the Euclidean distance from the corresponding observation to the
centroid of the 4 th cluster. The resulting feature is a numeric
variable. |
kmi | Cluster index encoding (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the cluster index for the corresponding observation. The resulting
feature is a categorical variable. |
q50(Variable) | Equiprobable binning (number of bins = 50) | The software splits the Variable values into
50 bins of equal probability. The resulting feature is a
categorical variable. |
woe5(Variable) | Weight of Evidence (positive class = Class5) | This transformation is available for classification problems only. The software performs the following steps to create the resulting feature:
|
Version History
Introduced in R2021a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)