Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.
After training a model in Regression Learner, check the History list to see which model has the best overall score. The best score is highlighted in a box. This score is the root mean square error (RMSE) on the validation set. The score estimates the performance of the trained model on new data. Use the score to help you choose the best model.
For cross-validation, the score is the RMSE on all observations, counting each observation when it was in a held-out fold.
When you imported data into the app, if you accepted the defaults, you are using cross-validation. To learn more, see Choose Validation Scheme.
For holdout validation, the score is the RMSE on the held-out observations.
For no validation, the score is the resubstitution RMSE on all the training data.
The best overall score might not be the best model for your goal. Sometimes a model with slightly lower overall score is the better model for your goal. You want to avoid overfitting, and you might want to exclude some predictors where data collection is expensive or difficult.
You can view model statistics in the Current Model window and use these statistics to assess and compare models. The statistics are calculated on the validation set.
|RMSE||Root mean square error. The RMSE is always positive and its units match the units of your response.||Look for smaller values of the RMSE.|
|R-Squared||Coefficient of determination. R-squared is always smaller than 1 and usually larger than 0. It compares the trained model with the model where the response is constant and equals the mean of the training response. If your model is worse than this constant model, then R-Squared is negative.||Look for an R-Squared close to 1.|
|MSE||Mean squared error. The MSE is the square of the RMSE.||Look for smaller values of the MSE.|
|MAE||Mean absolute error. The MAE is always positive and similar to the RMSE, but less sensitive to outliers.||Look for smaller values of the MAE.|
In the response plot, view the regression model results. After you train a regression model, the response plot displays the predicted response versus record number. If you are using holdout or cross-validation, then these predictions are the predictions on the held-out observations. In other words, each prediction is obtained using a model that was trained without using the corresponding observation. To investigate your results, use the controls on the right. You can:
Plot predicted and/or true responses. Use the check boxes under Plot to make your selection.
Show prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box.
Choose the variable to plot on the x-axis under X-axis. You can choose either the record number or one of your predictor variables.
Plot the response as markers, or as a box plot under Style. You can only select Box plot when the variable on the x-axis has few unique values.
A box plot displays the typical values of the response and any
possible outliers. The central mark indicates the median, and the
bottom and top edges of the box are the 25th and 75th percentiles,
respectively. Vertical lines, called whiskers, extend from the boxes
to the most extreme data points that are not considered outliers.
The outliers are plotted individually using the
For more information about box plots, see
Use the Predicted vs. Actual plot to check model performance. Use this plot to understand how well the regression model makes predictions for different response values. To view the Predicted vs. Actual plot after training a model, on the Regression Learner tab, in the Plots section, click Predicted vs. Actual Plot .
When you open the plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, and so the predictions are scattered near the line.
Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model. Try training a different model type or making your current model type more flexible using the Advanced options in the Model Type section. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor.
Use the residuals plot to check model performance. To view the residuals plot after training a model, on the Regression Learner tab, in the Plots section, click Residuals Plot . The residuals plot displays the difference between the predicted and true responses. Choose the variable to plot on the x-axis under X-axis. Choose either the true response, predicted response, record number, or one of your predictors.
Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. Look for these patterns:
Residuals are not symmetrically distributed around 0.
Residuals change significantly in size from left to right in the plot.
Outliers occur, that is, residuals that are much larger than the rest of the residuals.
Clear, nonlinear pattern appears in the residuals.
Try training a different model type, or making your current model type more flexible using the Advanced options in the Model Type section. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor.