ClassificationKNN class

k-nearest neighbor classification


A nearest-neighbor classification object, where both distance metric ("nearest") and number of neighbors can be altered. The object classifies new observations using the predict method. The object contains the data used for training, so can compute resubstitution predictions.


mdl = fitcknn(X,y) creates a k-nearest neighbor classification model.

mdl = fitcknn(X,y,Name,Value) creates a classifier with additional options specified by one or more Name,Value pair arguments. For details, see fitcknn.

Input Arguments

collapse all

X — Predictor valuesnumeric matrix

Predictor values, specified as a numeric matrix. Each column of X represents one variable, and each row represents one observation.

Data Types: single | double

y — Classification valuesnumeric vector | categorical vector | logical vector | character array | cell array of strings

Classification values, specified as a numeric vector, categorical vector, logical vector, character array, or cell array of strings, with the same number of rows as X. Each row of y represents the classification of the corresponding row of X.

Data Types: single | double | cell | logical | char



String specifying the method predict uses to break ties if multiple classes have the same smallest cost. By default, ties occur when multiple classes have the same number of nearest points among the K nearest neighbors.

  • 'nearest' — Use the class with the nearest neighbor among tied groups.

  • 'random' — Use a random tiebreaker among tied groups.

  • 'smallest' — Use the smallest index among tied groups.

'BreakTies' applies when 'IncludeTies' is false.

Change BreakTies using dot notation: mdl.BreakTies = newBreakTies.


Specification of which predictors are categorical.

  • 'all' — All predictors are categorical.

  • [] — No predictors are categorical.


List of elements in the training data Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of strings. ClassNames has the same data type as the data in the argument Y.

Change ClassNames using dot notation: mdl.ClassNames = newClassNames


Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

Change a Cost matrix using dot notation: obj.Cost = costMatrix.


String or function handle specifying the distance metric. The allowable strings depend on the NSMethod parameter, which you set in fitcknn, and which exists as a field in ModelParameters.

NSMethodDistance Metric Names
exhaustiveAny distance metric of ExhaustiveSearcher
kdtree'cityblock', 'chebychev', 'euclidean', or 'minkowski'

For definitions, see Distance Metrics.

The distance metrics of ExhaustiveSearcher:

'cityblock'City block distance.
'chebychev'Chebychev distance (maximum coordinate difference).
'correlation'One minus the sample linear correlation between observations (treated as sequences of values).
'cosine'One minus the cosine of the included angle between observations (treated as vectors).
'euclidean'Euclidean distance.
'hamming'Hamming distance, percentage of coordinates that differ.
'jaccard'One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
'mahalanobis'Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by nancov(X). To specify a different value for C, use the 'Cov' name-value pair.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair.
'seuclidean'Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled, meaning divided by a scale value S. The default value of S is the standard deviation computed from X, S = nanstd(X). To specify another value for S, use the Scale name-value pair.
'spearman'One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
@distfunDistance function handle. distfun has the form
function D2 = DISTFUN(ZI,ZJ)
% calculation of  distance
  • ZI is a 1-by-N vector containing one row of X or Y.

  • ZJ is an M2-by-N matrix containing multiple rows of X or Y.

  • D2 is an M2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(J,:).

Change Distance using dot notation: mdl.Distance = newDistance.

If NSMethod is kdtree, you can use dot notation to change Distance only among the types 'cityblock', 'chebychev', 'euclidean', or 'minkowski'.


String or function handle specifying the distance weighting function.

'equal'No weighting
'inverse'Weight is 1/distance
'inversesquared'Weight is 1/distance2
@fcnfcn is a function that accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights. For example, 'inversesquared' is equivalent to @(d)d.^(-2).

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight.


Additional parameter for the distance metric.

Distance MetricParameter
'mahalanobis'Positive definite covariance matrix C.
'minkowski'Minkowski distance exponent, a positive scalar.
'seuclidean'Vector of positive scale values with length equal to the number of columns of X.

For values of the distance metric other than those in the table, DistParameter must be [].

You can alter DistParameter using dot notation: mdl.DistParameter = newDistParameter. However, if Distance is mahalanobis or seuclidean, then you cannot alter DistParameter.


Logical value indicating whether predict includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, predict includes all these neighbors. Otherwise, predict uses exactly K neighbors (see 'BreakTies').

Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies.


Parameters used in training mdl.


Numeric vector of predictor means with length numel(PredictorNames).

If you did not standardize mdl when you trained it using fitcknn, then Mu is empty ([]).


Positive integer specifying the number of nearest neighbors in X to find for classifying each point when predicting. Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors.


Number of observations used in training mdl. This can be less than the number of rows in the training data, because data rows containing NaN values are not part of the fit.


Cell array of names for the predictor variables, in the order in which they appear in the training data X. Change PredictorNames using dot notation: mdl.PredictorNames = newPredictorNames.


Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Add or change a Prior vector using dot notation: obj.Prior = priorVector.


String describing the response variable Y. Change ResponseName using dot notation: mdl.ResponseName = newResponseName.


Numeric vector of predictor standard deviations with length numel(PredictorNames).

If you did not standardize mdl when you trained it using fitcknn, then Sigma is empty ([]).


Numeric vector of nonnegative weights with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y.


Numeric matrix of unstandardized predictor values. Each column of X represents one predictor (variable), and each row represents one observation.


A numeric vector, vector of categorical variables, logical vector, character array, or cell array of strings, with the same number of rows as X.

Y is of the same type as the passed-in Y data.


compareHoldoutCompare accuracies of two models using new data
crossvalCross-validated k-nearest neighbor classifier
edgeEdge of k-nearest neighbor classifier
lossLoss of k-nearest neighbor classifier
marginMargin of k-nearest neighbor classifier
predictPredict k-nearest neighbor classification
resubEdgeEdge of k-nearest neighbor classifier by resubstitution
resubLossLoss of k-nearest neighbor classifier by resubstitution
resubMarginMargin of k-nearest neighbor classifier by resubstitution
resubPredictPredict resubstitution response of k-nearest neighbor classifier



ClassificationKNN predicts the classification of a point Xnew using a procedure equivalent to this:

  1. Find the NumNeighbors points in the training set X that are nearest to Xnew.

  2. Find the NumNeighbors response values Y to those nearest points.

  3. Assign the classification label Ynew that has smallest expected misclassification cost among the values in Y.

For details, see Posterior Probability and Expected Cost in the predict documentation.

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® documentation.


collapse all

Train a k-Nearest Neighbor Classifier

Construct a k-nearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5.

Load Fisher's iris data.

load fisheriris
X = meas;
Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of strings that contains the corresponding iris species.

Train a 5-nearest neighbors classifier. It is good practice to standardize noncategorical predictor data.

Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1)
Mdl = 

     PredictorNames: {'x1'  'x2'  'x3'  'x4'}
       ResponseName: 'Y'
         ClassNames: {'setosa'  'versicolor'  'virginica'}
     ScoreTransform: 'none'
    NumObservations: 150
           Distance: 'euclidean'
       NumNeighbors: 5

Mdl is a trained ClassificationKNN classifier, and some of its properties display in the Command Window.

To access the properties of Mdl, use dot notation.

ans = 


ans =

    0.3333    0.3333    0.3333

Mdl.Prior contains the class prior probabilities, which are settable using the name-value pair argument 'Prior' in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data.

You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3 respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to, for example, ClassificationKNN.predict to label new measurements, or ClassificationKNN.crossval to cross validate the classifier.

Related Examples


knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in Classify Query Data. If you want to perform classification, ClassificationKNN can be more convenient, in that you can construct a classifier in one step and classify in other steps. Also, ClassificationKNN has cross-validation options.

Was this topic helpful?