testckfold
Compare accuracies of two classification models by repeated cross-validation
Syntax
Description
testckfold
statistically assesses the accuracies of two
classification models by repeatedly cross-validating the two models, determining the
differences in the classification loss, and then formulating the test statistic by
combining the classification loss differences. This type of test is particularly
appropriate when sample size is limited.
You can assess whether the accuracies of the classification models are different, or
whether one classification model performs better than another. Available tests include a
5-by-2 paired t test, a 5-by-2 paired F test, and
a 10-by-10 repeated cross-validation t test. For more details, see
Repeated Cross-Validation Tests. To speed up computations,
testckfold
supports parallel computing (requires a Parallel Computing Toolbox™ license).
returns
the test decision that results from conducting a 5-by-2 paired F cross-validation
test. The null hypothesis is the classification models h
= testckfold(C1
,C2
,X1
,X2
)C1
and C2
have
equal accuracy in predicting the true class labels using the predictor
and response data in the tables X1
and X2
. h
= 1
indicates
to reject the null hypothesis at the 5% significance level.
testckfold
conducts the cross-validation
test by applying C1
and C2
to
all predictor variables in X1
and X2
,
respectively. The true class labels in X1
and X2
must
be the same. The response variable names in X1
, X2
, C1.ResponseName
,
and C2.ResponseName
must be the same.
For examples of ways to compare models, see Tips.
uses
any of the input arguments in the previous syntaxes and additional
options specified by one or more h
= testckfold(___,Name,Value
)Name,Value
pair
arguments. For example, you can specify the type of alternative hypothesis,
the type of test, or the use of parallel computing.
Examples
Input Arguments
Output Arguments
More About
Tips
Examples of ways to compare models include:
Compare the accuracies of a simple classification model and a more complex model by passing the same set of predictor data.
Compare the accuracies of two different models using two different sets of predictors.
Perform various types of Feature Selection. For example, you can compare the accuracy of a model trained using a set of predictors to the accuracy of one trained on a subset or different set of predictors. You can arbitrarily choose the set of predictors, or use a feature selection technique like PCA or sequential feature selection (see
pca
andsequentialfs
).
If both of these statements are true, then you can omit supplying
Y
.Consequently,
testckfold
uses the common response variable in the tables.One way to perform cost-insensitive feature selection is:
Create a classification model template that characterizes the first classification model (
C1
).Create a classification model template that characterizes the second classification model (
C2
).Specify two predictor data sets. For example, specify
X1
as the full predictor set andX2
as a reduced set.Enter
testckfold(C1,C2,X1,X2,Y,'Alternative','less')
. Iftestckfold
returns1
, then there is enough evidence to suggest that the classification model that uses fewer predictors performs better than the model that uses the full predictor set.
Alternatively, you can assess whether there is a significant difference between the accuracies of the two models. To perform this assessment, remove the
'Alternative','less'
specification in step 4.testckfold
conducts a two-sided test, andh = 0
indicates that there is not enough evidence to suggest a difference in the accuracy of the two models.The tests are appropriate for the misclassification rate classification loss, but you can specify other loss functions (see
LossFun
). The key assumptions are that the estimated classification losses are independent and normally distributed with mean 0 and finite common variance under the two-sided null hypothesis. Classification losses other than the misclassification rate can violate this assumption.Highly discrete data, imbalanced classes, and highly imbalanced cost matrices can violate the normality assumption of classification loss differences.
Algorithms
If you specify to conduct the 10-by-10 repeated cross-validation t test
using 'Test','10x10t'
, then testckfold
uses
10 degrees of freedom for the t distribution to
find the critical region and estimate the p-value.
For more details, see [2] and [3].
Alternatives
Use testcholdout
:
For test sets with larger sample sizes
To implement variants of the McNemar test to compare two classification model accuracies
For cost-sensitive testing using a chi-square or likelihood ratio test. The chi-square test uses
quadprog
(Optimization Toolbox), which requires an Optimization Toolbox™ license.
References
[1] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.” Neural Computation, Vol. 11, No. 8, 1999, pp. 1885–1992.
[2] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.” International Conference on Machine Learning, 2003, pp. 51–58.
[3] Bouckaert, R., and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.” Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference, 2004, pp. 3–12.
[4] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.” Neural Computation, Vol. 10, No. 7, 1998, pp. 1895–1923.
[5] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd Ed. New York: Springer, 2008.
Extended Capabilities
Version History
Introduced in R2015a
See Also
testcholdout
| templateECOC
| templateEnsemble
| templateDiscriminant
| templateTree
| templateSVM
| templateNaiveBayes
| templateKNN