Main Content

preparedPredictors

Obtain prepared data used for training or testing in direct forecasting

Since R2023b

    Description

    example

    preparedX = preparedPredictors(Mdl,Tbl) uses the training or test set data Tbl and returns the prepared predictor data preparedX, which is used by Mdl.Learners{1} for training or testing at horizon step Mdl.Horizon(1). Mdl must be a direct forecasting model.

    preparedX = preparedPredictors(Mdl,X,Y) returns prepared predictors for the exogenous predictor data X and the response data Y. This syntax assumes that Mdl uses exogenous predictors and lagged response variables as predictors. That is, Mdl.PredictorNames and Mdl.ResponseLags are nonempty.

    preparedX = preparedPredictors(Mdl,X) returns prepared predictors when the model Mdl does not use lagged response variables as predictors. That is, Mdl.ResponseLags must be empty.

    preparedX = preparedPredictors(Mdl,Y) returns prepared predictors when the model Mdl does not use exogenous predictors. That is, Mdl.PredictorNames must be empty.

    example

    preparedX = preparedPredictors(___,HorizonStep=step) specifies the horizon step at which to prepare the predictor data, in addition to any of the input argument combinations in previous syntaxes.

    Examples

    collapse all

    When you perform direct forecasting using directforecaster, the function creates lagged and leading predictors from the training data before fitting a DirectForecaster model. Similarly, the loss and predict object functions reformat the test data before computing loss and prediction values, respectively.

    This example shows how to access the prepared predictor data used by direct forecasting models for training and testing.

    Load the sample file TemperatureData.csv, which contains average daily temperatures from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.

    temperatures = readtable("TemperatureData.csv");
    head(temperatures)
        Year       Month       Day    TemperatureF
        ____    ___________    ___    ____________
    
        2015    {'January'}     1          23     
        2015    {'January'}     2          31     
        2015    {'January'}     3          25     
        2015    {'January'}     4          39     
        2015    {'January'}     5          29     
        2015    {'January'}     6          12     
        2015    {'January'}     7          10     
        2015    {'January'}     8           4     
    

    For this example, use a subset of the temperature data that omits the first 100 observations.

    Tbl = temperatures(101:end,:);

    Create a datetime variable t that contains the year, month, and day information for each observation in Tbl. Then, use t to convert Tbl into a timetable.

    numericMonth = month(datetime(Tbl.Month, ...
        InputFormat="MMMM",Locale="en_US"));
    t = datetime(Tbl.Year,numericMonth,Tbl.Day);
    Tbl.Time = t;
    Tbl = table2timetable(Tbl);

    Plot the temperature values in Tbl over time.

    plot(Tbl.Time,Tbl.TemperatureF)
    xlabel("Date")
    ylabel("Temperature in Fahrenheit")

    Partition the temperature data into training and test sets by using tspartition. Reserve 20% of the observations for testing.

    partition = tspartition(size(Tbl,1),"Holdout",0.20);
    trainingTbl = Tbl(training(partition),:);
    testTbl = Tbl(test(partition),:);

    Create a full direct forecasting model by using the data in trainingTbl. Specify the horizon steps as one to seven steps ahead. Train a model at each horizon step using a boosted ensemble of trees. All three of the predictors (Year, Month, and Day) are leading predictors because their future values are known.

    To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags. For this example, use the following as predictors values: the current and previous Year values, the current and previous Month values, the current and previous seven Day values, and the previous seven TemperatureF values.

    Mdl = directforecaster(trainingTbl,"TemperatureF", ...
        Horizon=1:7,LeadingPredictors="all", ...
        LeadingPredictorLags={0:1,0:1,0:7}, ...
        ResponseLags=1:7)
    Mdl = 
      DirectForecaster
    
                      Horizon: [1 2 3 4 5 6 7]
                 ResponseLags: [1 2 3 4 5 6 7]
            LeadingPredictors: [1 2 3]
         LeadingPredictorLags: {[0 1]  [0 1]  [0 1 2 3 4 5 6 7]}
                 ResponseName: 'TemperatureF'
               PredictorNames: {'Year'  'Month'  'Day'}
        CategoricalPredictors: [2]
                     Learners: {7x1 cell}
                       MaxLag: [7]
              NumObservations: [372]
    
    
    

    Mdl is a DirectForecaster model object. Mdl consists of seven regression models: Mdl.Learners{1}, which predicts one step into the future; Mdl.Learners{2}, which predicts two steps into the future; and so on.

    Compare the first and seventh regression models in Mdl.

    Mdl.Learners{1}
    ans = 
      CompactRegressionEnsemble
               PredictorNames: {1x19 cell}
                 ResponseName: 'TemperatureF_Step1'
        CategoricalPredictors: [10 11]
            ResponseTransform: 'none'
                   NumTrained: 100
    
    
    
    Mdl.Learners{7}
    ans = 
      CompactRegressionEnsemble
               PredictorNames: {1x19 cell}
                 ResponseName: 'TemperatureF_Step7'
        CategoricalPredictors: [10 11]
            ResponseTransform: 'none'
                   NumTrained: 100
    
    
    

    The regression models in Mdl are all CompactRegressionEnsemble objects. Because the models are compact, they do not include the predictor data used to train them.

    To see the data used to train the regression models in Mdl, use the preparedPredictors object function.

    Observe the prepared predictor data used to train Mdl.Learners{1}. By default, preparedPredictors returns the prepared predictor data used at horizon step Mdl.Horizon(1), which in this case is one step ahead.

    prepTrainingTbl1 = preparedPredictors(Mdl,trainingTbl)
    prepTrainingTbl1=372×19 timetable
           Time        TemperatureF_Lag1    TemperatureF_Lag2    TemperatureF_Lag3    TemperatureF_Lag4    TemperatureF_Lag5    TemperatureF_Lag6    TemperatureF_Lag7    Year_Step1    Year_Lag1    Month_Step1    Month_Lag1    Day_Step1    Day_Lag1    Day_Lag2    Day_Lag3    Day_Lag4    Day_Lag5    Day_Lag6    Day_Lag7
        ___________    _________________    _________________    _________________    _________________    _________________    _________________    _________________    __________    _________    ___________    __________    _________    ________    ________    ________    ________    ________    ________    ________
    
        10-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          NaN        {'April'}     {0x0 char}       10          NaN         NaN         NaN         NaN         NaN         NaN         NaN   
        11-Apr-2015            41                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015         2015        {'April'}     {'April' }       11           10         NaN         NaN         NaN         NaN         NaN         NaN   
        12-Apr-2015            45                   41                  NaN                  NaN                  NaN                  NaN                  NaN              2015         2015        {'April'}     {'April' }       12           11          10         NaN         NaN         NaN         NaN         NaN   
        13-Apr-2015            49                   45                   41                  NaN                  NaN                  NaN                  NaN              2015         2015        {'April'}     {'April' }       13           12          11          10         NaN         NaN         NaN         NaN   
        14-Apr-2015            50                   49                   45                   41                  NaN                  NaN                  NaN              2015         2015        {'April'}     {'April' }       14           13          12          11          10         NaN         NaN         NaN   
        15-Apr-2015            54                   50                   49                   45                   41                  NaN                  NaN              2015         2015        {'April'}     {'April' }       15           14          13          12          11          10         NaN         NaN   
        16-Apr-2015            54                   54                   50                   49                   45                   41                  NaN              2015         2015        {'April'}     {'April' }       16           15          14          13          12          11          10         NaN   
        17-Apr-2015            46                   54                   54                   50                   49                   45                   41              2015         2015        {'April'}     {'April' }       17           16          15          14          13          12          11          10   
        18-Apr-2015            51                   46                   54                   54                   50                   49                   45              2015         2015        {'April'}     {'April' }       18           17          16          15          14          13          12          11   
        19-Apr-2015            47                   51                   46                   54                   54                   50                   49              2015         2015        {'April'}     {'April' }       19           18          17          16          15          14          13          12   
        20-Apr-2015            41                   47                   51                   46                   54                   54                   50              2015         2015        {'April'}     {'April' }       20           19          18          17          16          15          14          13   
        21-Apr-2015            41                   41                   47                   51                   46                   54                   54              2015         2015        {'April'}     {'April' }       21           20          19          18          17          16          15          14   
        22-Apr-2015            51                   41                   41                   47                   51                   46                   54              2015         2015        {'April'}     {'April' }       22           21          20          19          18          17          16          15   
        23-Apr-2015            50                   51                   41                   41                   47                   51                   46              2015         2015        {'April'}     {'April' }       23           22          21          20          19          18          17          16   
        24-Apr-2015            40                   50                   51                   41                   41                   47                   51              2015         2015        {'April'}     {'April' }       24           23          22          21          20          19          18          17   
        25-Apr-2015            39                   40                   50                   51                   41                   41                   47              2015         2015        {'April'}     {'April' }       25           24          23          22          21          20          19          18   
          ⋮
    
    

    prepTrainingTbl1 contains lagged predictors (with Lag in their names) and leading predictors (with Step in their names). The table contains missing values due to the creation of these prepared predictors. For example, TemperatureF_Lag1 contains a missing value at time 10-Apr-2015 because the temperature at time 09-Apr-2015 is not known.

    Observe the prepared predictor data used to train Mdl.Learners{7}.

    prepTrainingTbl7 = preparedPredictors(Mdl,trainingTbl, ...
        HorizonStep=7)
    prepTrainingTbl7=372×19 timetable
           Time        TemperatureF_Lag1    TemperatureF_Lag2    TemperatureF_Lag3    TemperatureF_Lag4    TemperatureF_Lag5    TemperatureF_Lag6    TemperatureF_Lag7    Year_Step7    Year_Step6    Month_Step7    Month_Step6    Day_Step7    Day_Step6    Day_Step5    Day_Step4    Day_Step3    Day_Step2    Day_Step1    Day_Lag1
        ___________    _________________    _________________    _________________    _________________    _________________    _________________    _________________    __________    __________    ___________    ___________    _________    _________    _________    _________    _________    _________    _________    ________
    
        10-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015           NaN        {'April'}     {0x0 char}        10           NaN          NaN          NaN          NaN          NaN          NaN         NaN   
        11-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        11            10          NaN          NaN          NaN          NaN          NaN         NaN   
        12-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        12            11           10          NaN          NaN          NaN          NaN         NaN   
        13-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        13            12           11           10          NaN          NaN          NaN         NaN   
        14-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        14            13           12           11           10          NaN          NaN         NaN   
        15-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        15            14           13           12           11           10          NaN         NaN   
        16-Apr-2015           NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        16            15           14           13           12           11           10         NaN   
        17-Apr-2015            41                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        17            16           15           14           13           12           11          10   
        18-Apr-2015            45                   41                  NaN                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        18            17           16           15           14           13           12          11   
        19-Apr-2015            49                   45                   41                  NaN                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        19            18           17           16           15           14           13          12   
        20-Apr-2015            50                   49                   45                   41                  NaN                  NaN                  NaN              2015          2015        {'April'}     {'April' }        20            19           18           17           16           15           14          13   
        21-Apr-2015            54                   50                   49                   45                   41                  NaN                  NaN              2015          2015        {'April'}     {'April' }        21            20           19           18           17           16           15          14   
        22-Apr-2015            54                   54                   50                   49                   45                   41                  NaN              2015          2015        {'April'}     {'April' }        22            21           20           19           18           17           16          15   
        23-Apr-2015            46                   54                   54                   50                   49                   45                   41              2015          2015        {'April'}     {'April' }        23            22           21           20           19           18           17          16   
        24-Apr-2015            51                   46                   54                   54                   50                   49                   45              2015          2015        {'April'}     {'April' }        24            23           22           21           20           19           18          17   
        25-Apr-2015            47                   51                   46                   54                   54                   50                   49              2015          2015        {'April'}     {'April' }        25            24           23           22           21           20           19          18   
          ⋮
    
    

    Because Mdl.Learners{7} predicts seven steps ahead, prepTrainingTbl7 contains different predictors from the predictors in prepTrainingTbl1. For example, prepTrainingTbl7 contains the predictors Year_Step7 and Year_Step6 instead of the predictors Year_Step1 and Year_Lag1 in prepTrainingTbl1. The step numbers indicate the horizon steps (that is, the number of time steps ahead).

    Compute the test set mean squared error at each horizon step.

    mse = loss(Mdl,testTbl)
    mse = 1×7
    
       32.1256   45.3297   49.8831   49.3660   55.7613   50.4300   53.6758
    
    

    Obtain the prepared test set predictor data used by Mdl.Learners{1} to compute mse(1). Compare the variables in prepTestTbl1 and prepTrainingTbl1.

    prepTestTbl1 = preparedPredictors(Mdl,testTbl);
    isequal(prepTrainingTbl1.Properties.VariableNames, ...
        prepTestTbl1.Properties.VariableNames)
    ans = logical
       1
    
    

    The prepared predictors in prepTestTbl1 and prepTrainingTbl1 are the same.

    Similarly, obtain the prepared test set predictor data used by Mdl.Learners{7} to compute mse(7). Compare the variables in prepTestTbl7 and prepTrainingTbl7.

    prepTestTbl7 = preparedPredictors(Mdl,testTbl, ...
        HorizonStep=7);
    isequal(prepTrainingTbl7.Properties.VariableNames, ...
        prepTestTbl7.Properties.VariableNames)
    ans = logical
       1
    
    

    The prepared predictors in prepTestTbl7 and prepTrainingTbl7 are also the same.

    Input Arguments

    collapse all

    Direct forecasting model, specified as a DirectForecaster or CompactDirectForecaster model object.

    Training or test set data, specified as a table or timetable. Each row of Tbl corresponds to one observation, and each column corresponds to one variable. Tbl must have the same data type as the predictor data argument used to train Mdl, and must include all exogenous predictors and the response variable.

    Training or test set exogenous predictor data, specified as a numeric matrix, table, or timetable. Each row of X corresponds to one observation, and each column corresponds to one predictor. X must have the same data type as the predictor data argument used to train Mdl, and must consist of the same exogenous predictors.

    Training or test set response data, specified as a numeric vector, one-column table, or one-column timetable. Each row of Y corresponds to one observation.

    • If X is a numeric matrix, then Y must be a numeric vector.

    • If X is a table, then Y must be a numeric vector or one-column table.

    • If X is a timetable or it is not specified, then Y must be a numeric vector, one-column table, or one-column timetable.

    If you specify both X and Y, then they must have the same number of observations.

    Horizon step at which to prepare data, specified as a positive integer scalar. step must be one of the values in Mdl.Horizon.

    If step is element i in Mdl.Horizon, then Mdl.PreparedPredictorsPerHorizon(i,:) indicates the prepared predictors in preparedX.

    Example: 2

    Data Types: single | double

    Output Arguments

    collapse all

    Prepared predictor data used for training or testing at the specified horizon step, returned as a numeric matrix, table, or timetable. preparedX has the same data type as the data used to train Mdl and is of size n-by-p, where n is the number of observations in Tbl, X, or Y, and p is the number of prepared predictors at the specified horizon step.

    Limitations

    • When you use the preparedPredictors object function, the data set must contain at least Mdl.MaxLag + max(Mdl.Horizon) observations. The software requires these observations for creating lagged and leading predictors.

    Version History

    Introduced in R2023b