Main Content

edge

Edge of k-nearest neighbor classifier

Description

E = edge(mdl,Tbl,ResponseVarName) returns the classification edge for mdl with data Tbl and classification Tbl.ResponseVarName. If Tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName.

The classification edge (E) is a scalar value that represents the mean of the classification margins.

E = edge(mdl,Tbl,Y) returns the classification edge for mdl with data Tbl and classification Y.

example

E = edge(mdl,X,Y) returns the classification edge for mdl with data X and classification Y.

E = edge(___,'Weights',weights) computes the edge with additional observation weights weights, using any of the input arguments in the previous syntaxes.

Note

If the predictor data X or the predictor variables in Tbl contain any missing values, the edge function can return NaN. For more details, see edge can return NaN for predictor data with missing values.

Examples

collapse all

Create a k-nearest neighbor classifier for the Fisher iris data, where k = 5.

Load the Fisher iris data set.

load fisheriris
X = meas;
Y = species;

Create a classifier for five nearest neighbors.

mdl = fitcknn(X,Y,'NumNeighbors',5);

Examine the edge of the classifier for minimum, mean, and maximum observations classified as 'setosa', 'versicolor', and 'virginica', respectively.

NewX = [min(X);mean(X);max(X)];
Y = {'setosa';'versicolor';'virginica'};
E = edge(mdl,NewX,Y)
E = 1

All five nearest neighbors of each NewX point classify as the corresponding Y entry.

Input Arguments

collapse all

k-nearest neighbor classifier model, specified as a ClassificationKNN object.

Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed.

If Tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName or Y.

If you train mdl using sample data contained in a table, then the input data for edge must also be in a table.

Data Types: table

Response variable name, specified as the name of a variable in Tbl. If Tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName.

You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable is stored as Tbl.response, then specify it as 'response'. Otherwise, the software treats all columns of Tbl, including Tbl.response, as predictors.

The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array.

Data Types: char | string

Predictor data, specified as a numeric matrix. Each row of X represents one observation, and each column represents one variable.

Data Types: single | double

Class labels, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

Data Types: categorical | char | string | logical | single | double | cell

Observation weights, specified as a numeric vector or the name of a variable in Tbl.

If you specify weights as a numeric vector, then the size of weights must be equal to the number of rows in X or Tbl.

If you specify weights as the name of a variable in Tbl, then the name must be a character vector or string scalar. For example, if the weights are stored as Tbl.w, then specify weights as 'w'. Otherwise, the software treats all columns of Tbl, including Tbl.w, as predictors.

If you specify weights, then the edge function weights the observation in each row of X or Tbl with the corresponding weight in weights.

Example: 'Weights','w'

Data Types: single | double | char | string

More About

collapse all

Margin

The classification margin for each observation is the difference between the classification score for the true class and the maximal classification score for the false classes.

The classification margins form a column vector with the same number of rows as X or Tbl.

Score

The score of a classification is the posterior probability of the classification. The posterior probability is the number of neighbors with that classification divided by the number of neighbors. For a more detailed definition that includes weights and prior probabilities, see Posterior Probability.

Extended Capabilities

Version History

Introduced in R2012a

expand all