spectralcluster

Spectral clustering

collapse all in page

Syntax

idx = spectralcluster(X,k)

idx = spectralcluster(S,k,'Distance','precomputed')

idx = spectralcluster(___,Name,Value)

[idx,V] = spectralcluster(___)

[idx,V,D] = spectralcluster(___)

Description

idx = spectralcluster(X,k) partitions observations in the n-by-p data matrix X into k clusters using the spectral clustering algorithm (see Algorithms). spectralcluster returns an n-by-1 vector idx containing cluster indices of each observation.

example

idx = spectralcluster(S,k,'Distance','precomputed') returns a vector of cluster indices for S, the similarity matrix (or adjacency matrix) of a similarity graph. S can be the output of adjacency.

To use a similarity matrix as the first input, you must specify 'Distance','precomputed'.

example

idx = spectralcluster(___,Name,Value) specifies additional options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, you can specify 'SimilarityGraph','epsilon' to construct a similarity graph using the radius search method.

example

[idx,V] = spectralcluster(___) also returns the eigenvectors V corresponding to the k smallest eigenvalues of the Laplacian matrix.

[idx,V,D] = spectralcluster(___) also returns a vector D containing the k smallest eigenvalues of the Laplacian matrix.

example

Examples

collapse all

Perform Spectral Clustering on Input Data

Open Live Script

Cluster a 2-D circular data set using spectral clustering with the default Euclidean distance metric.

Generate synthetic data that contains two noisy circles.

rng('default') % For reproducibility

% Parameters for data generation
N = 300;  % Size of each cluster
r1 = 2;   % Radius of first circle
r2 = 4;   % Radius of second circle
theta = linspace(0,2*pi,N)';

X1 = r1*[cos(theta),sin(theta)]+ rand(N,1); 
X2 = r2*[cos(theta),sin(theta)]+ rand(N,1);
X = [X1;X2]; % Noisy 2-D circular data set

Find two clusters in the data by using spectral clustering.

idx = spectralcluster(X,2);

Visualize the result of clustering.

gscatter(X(:,1),X(:,2),idx);

Figure contains an axes object. The axes object contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent 1, 2.

The spectralcluster function correctly identifies the two clusters in the data set.

Perform Spectral Clustering on Similarity Matrix

Open Live Script

Compute a similarity matrix from Fisher's iris data set and perform spectral clustering on the similarity matrix.

Load Fisher's iris data set. Use the petal lengths and widths as features to consider for clustering.

load fisheriris
X = meas(:,3:4);
gscatter(X(:,1),X(:,2),species);

Figure contains an axes object. The axes object contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent setosa, versicolor, virginica.

Find the distance between each pair of observations in X by using the pdist and squareform functions with the default Euclidean distance metric.

dist_temp = pdist(X);
dist = squareform(dist_temp);

Construct the similarity matrix and confirm that it is symmetric.

S = exp(-dist.^2);
issymmetric(S)

ans = logical
   1

Perform spectral clustering. Specify 'Distance','precomputed' to perform clustering using the similarity matrix. Specify k=3 clusters, and set the 'LaplacianNormalization' name-value pair argument to use the normalized symmetric Laplacian matrix.

k = 3; % Number of clusters
rng('default') % For reproducibility
idx = spectralcluster(S,k,'Distance','precomputed','LaplacianNormalization','symmetric');

idx contains the cluster indices for each observation in X.

Visualize the result of clustering.

gscatter(X(:,1),X(:,2),idx);

Figure contains an axes object. The axes object contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent 1, 2, 3.

Tabulate the clustering results.

tabulate(idx)

  Value    Count   Percent
      1       48     32.00%
      2       50     33.33%
      3       52     34.67%

The Percent column shows the percentage of data points assigned to the three clusters.

Repeat spectral clustering using the data as input to spectralcluster. Specify 'NumNeighbors' as size(X,1), which corresponds to creating the similarity matrix S by connecting each point to all the remaining points.

idx2 = spectralcluster(X,k,'NumNeighbors',size(X,1),'LaplacianNormalization','symmetric');
gscatter(X(:,1),X(:,2),idx2);

Figure contains an axes object. The axes object contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent 1, 2, 3.

tabulate(idx2)

  Value    Count   Percent
      1       50     33.33%
      2       52     34.67%
      3       48     32.00%

The clustering results for both approaches are the same. The order of cluster assignments is different, even though the data points are clustered in the same way.

Cluster Using Radius Search for Similarity Graph

Open Live Script

Find clusters in a data set, based on a specified search radius for creating a similarity graph.

Create data with 3 clusters, each containing 500 points.

rng('default') % For reproducibility
N = 500;
X = [mvnrnd([0 0],eye(2),N); ...
    mvnrnd(5*[1 -1],eye(2),N); ...
    mvnrnd(5*[1 1],eye(2),N)];

Specify a search radius of 2 for creating a similarity graph, and find 3 clusters in the data.

idx = spectralcluster(X,3,'SimilarityGraph','epsilon','Radius',2);

Visualize the result of clustering.

gscatter(X(:,1),X(:,2),idx);

Figure contains an axes object. The axes object contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent 1, 2, 3.

Find Eigenvalues and Eigenvectors of Laplacian Matrix

Open Live Script

Find the eigenvalues and eigenvectors of the Laplacian matrix and use the values to confirm clustering results.

Randomly generate sample data with three well-separated clusters, each containing 100 points.

rng('default'); % For reproducibility
n = 100;
X = [randn(n,2)*0.5+3;
    randn(n,2)*0.5
    randn(n,2)*0.5-3];

Estimate the number of clusters in the data by using the eigenvalues of the Laplacian matrix. Compute the five smallest eigenvalues (in magnitude) of the Laplacian matrix.

[~,~,D_temp] = spectralcluster(X,5)

D_temp = 5×1

   -0.0000
   -0.0000
   -0.0000
    0.0277
    0.0296

Only the first three eigenvalues are approximately zero. The number of zero eigenvalues is a good indicator of the number of connected components in a similarity graph and, therefore, is a good estimate of the number of clusters in your data. So, k=3 is a good estimate of the number of clusters in X.

Find k=3 clusters and return the three smallest eigenvalues and corresponding eigenvectors of the Laplacian matrix.

[idx,V,D] = spectralcluster(X,3)

idx = 300×1

     1
     1
     1
     1
     1
     1
     1
     1
     1
     1
      ⋮

V = 300×3

   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
   -0.0000   -0.0000   -0.1000
      ⋮

D = 3×1
10^-16 ×

   -0.3848
   -0.4734
   -0.5171

Elements of D correspond to the three smallest eigenvalues of the Laplacian matrix. The columns of V contain the eigenvectors corresponding to the eigenvalues in D. For well-separated clusters, the eigenvectors are indicator vectors. The eigenvectors have values of zero (or close to zero) for points that do not belong to a particular cluster, and nonzero values for points that belong to a particular cluster.

Visualize the result of clustering.

gscatter(X(:,1),X(:,2),idx);

Figure contains an axes object. The axes object contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent 1, 2, 3.

Input Arguments

collapse all

`X` — Input data
numeric matrix

Input data, specified as an n-by-p numeric matrix. The rows of X correspond to observations (or points), and the columns correspond to variables.

The software treats NaNs in X as missing data and ignores any row of X containing at least one NaN. The spectralcluster function returns NaN values for the corresponding row in the output arguments idx and V.

Data Types: single | double

`S` — Similarity matrix
symmetric matrix

Similarity matrix, specified as an n-by-n symmetric matrix, where n is the number of observations. A similarity matrix (or adjacency matrix) represents the input data by modeling local neighborhood relationships among the data points. The values in a similarity matrix represent the edges (or connections) between nodes (data points) that are connected in a similarity graph. For more information, see Similarity Matrix.

S must not contain any NaN values.

To use a similarity matrix as the first input of spectralcluster, you must specify 'Distance','precomputed'.

Data Types: single | double

`k` — Number of clusters
positive integer

Number of clusters in the data, specified as a positive integer.

For details about how to estimate the number of clusters, see Tips.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: spectralcluster(X,3,'SimilarityGraph','epsilon','Radius',5) specifies 3 clusters and uses the radius search method with a search radius of 5 to construct a similarity graph.

`Distance` — Distance metric
character vector | string scalar | function handle

Distance metric, specified as the comma-separated pair consisting of 'Distance' and a character vector, string scalar, or function handle, as described in this table.

Value	Description
`'precomputed'`	Precomputed distance. You must specify this option if the first input to `spectralcluster` is a similarity matrix `S`.
`'euclidean'`	Euclidean distance (default)
`'seuclidean'`	Standardized Euclidean distance. Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation computed from `X`. Use the `Scale` name-value pair argument to specify a different scaling factor.
`'mahalanobis'`	Mahalanobis distance using the sample covariance of `X`, `C = cov(X,'omitrows')`. Use the `Cov` name-value pair argument to specify a different covariance matrix.
`'cityblock'`	City block distance
`'minkowski'`	Minkowski distance. The default exponent is 2. Use the `P` name-value pair argument to specify a different exponent, where `P` is a positive scalar value.
`'chebychev'`	Chebychev distance (maximum coordinate difference)
`'cosine'`	One minus the cosine of the included angle between observations (treated as vectors)
`'correlation'`	One minus the sample correlation between observations (treated as sequences of values)
`'hamming'`	Hamming distance, which is the percentage of coordinates that differ
`'jaccard'`	One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ
`'spearman'`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values)
`@distfun`	Custom distance function handle. A distance function has the form function D2 = distfun(ZI,ZJ) % calculation of distance ... where `ZI` is a `1`-by-`n` vector containing a single observation. `ZJ` is an `m2`-by-`n` matrix containing multiple observations. `distfun` must accept a matrix `ZJ` with an arbitrary number of observations. `D2` is an `m2`-by-`1` vector of distances, and `D2(k)` is the distance between observations `ZI` and `ZJ(k,:)`. If your data is not sparse, you can generally compute distance more quickly by using a built-in distance instead of a function handle.

For more information, see Distance Metrics.

When you use the 'seuclidean', 'minkowski', or 'mahalanobis' distance metric, you can specify the additional name-value pair argument 'Scale', 'P', or 'Cov', respectively, to control the distance metric.

Example: spectralcluster(X,5,'Distance','minkowski','P',3) specifies 5 clusters and uses of the Minkowski distance metric with an exponent of 3 to perform the clustering algorithm.

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar.

This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',3

Data Types: single | double

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix.

This argument is valid only if 'Distance' is 'mahalanobis'.

Example: 'Cov',eye(4)

Data Types: single | double

`Scale` — Scaling factors for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | numeric vector of nonnegative values

Scaling factors for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a numeric vector of nonnegative values.

Scale has length p (the number of columns in X), because each dimension (column) of X has a corresponding value in Scale. For each dimension of X, spectralcluster uses the corresponding value in Scale to standardize the difference between observations.

This argument is valid only if 'Distance' is 'seuclidean'.

Data Types: single | double

`SimilarityGraph` — Type of similarity graph
`'knn'` (default) | `'epsilon'`

Type of similarity graph to construct from the input data X, specified as the comma-separated pair consisting of 'SimilarityGraph' and one of these values.

Value Description Graph-Specific Name-Value Pair Arguments

Value	Description	Graph-Specific Name-Value Pair Arguments
`'knn'`	(Default) Construct the graph using nearest neighbors.	`'NumNeighbors'` — Number of nearest neighbors used to construct the similarity graph `'KNNGraphType'` — Type of nearest neighbor graph
`'epsilon'`	Construct the graph using a radius search. You must specify a value for `Radius` if you use this option.	`'Radius'` — Search radius for the nearest neighbors used to construct the similarity graph

'knn'

(Default) Construct the graph using nearest neighbors.

'NumNeighbors' — Number of nearest neighbors used to construct the similarity graph

'KNNGraphType' — Type of nearest neighbor graph

'epsilon'

Construct the graph using a radius search. You must specify a value for Radius if you use this option.

'Radius' — Search radius for the nearest neighbors used to construct the similarity graph

For more information, see Similarity Graph.

This argument is valid only if 'Distance' is not 'precomputed'.

Example: 'SimilarityGraph','epsilon'

`NumNeighbors` — Number of nearest neighbors
`log(size(X,1))` (default) | positive integer

Number of nearest neighbors used to construct the similarity graph, specified as the comma-separated pair consisting of 'NumNeighbors' and a positive integer.

This argument is valid only if 'SimilarityGraph' is 'knn'. For more information, see Similarity Graph.

Example: 'NumNeighbors',10

Data Types: single | double

`KNNGraphType` — Type of nearest neighbor graph
`'complete'` (default) | `'mutual'`

Type of nearest neighbor graph, specified as the comma-separated pair consisting of 'KNNGraphType' and one of these values.

Value Description

Value	Description
`'complete'`	(Default) Connects two points i and j, when either i is a nearest neighbor of j or j is a nearest neighbor of i. This option leads to a denser representation of the similarity matrix.
`'mutual'`	Connects two points i and j, when i is a nearest neighbor of j and j is a nearest neighbor of i. This option leads to a sparser representation of the similarity matrix.

'complete'

(Default) Connects two points i and j, when either i is a nearest neighbor of j or j is a nearest neighbor of i.

This option leads to a denser representation of the similarity matrix.

'mutual'

Connects two points i and j, when i is a nearest neighbor of j and j is a nearest neighbor of i.

This option leads to a sparser representation of the similarity matrix.

This argument is valid only if 'SimilarityGraph' is 'knn'.

Example: 'KNNGraphType','mutual'

`Radius` — Search radius
nonnegative scalar

Search radius for the nearest neighbors used to construct the similarity graph, specified as the comma-separated pair consisting of 'Radius' and a nonnegative scalar.

You must specify this argument if 'SimilarityGraph' is 'epsilon'. For more information, see Similarity Graph.

Example: 'Radius',5

Data Types: single | double

`KernelScale` — Scale factor
`1` (default) | `'auto'` | positive scalar

Scale factor for the kernel, specified as the comma-separated pair consisting of 'KernelScale' and 'auto' or a positive scalar. The software uses the scale factor to transform distances to similarity measures. For more information, see Similarity Graph.

The 'auto' option is supported only for the 'euclidean' and 'seuclidean' distance metrics.
If you specify 'auto', then the software selects an appropriate scale factor using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. To reproduce results, set a random number seed using rng before calling spectralcluster.

This argument is valid only if 'Distance' is not 'precomputed'.

Example: 'KernelScale','auto'

Data Types: double | single | char | string

`LaplacianNormalization` — Method to normalize Laplacian matrix
`'randomwalk'` (default) | `'symmetric'` | `'none'`

Method to normalize the Laplacian matrix L, specified as the comma-separated pair consisting of 'LaplacianNormalization' and one of these values.

Value Description

Value	Description
`'none'`	Use Laplacian matrix L without normalization.
`'randomwalk'`	(Default) Use the normalized random-walk Laplacian matrix L_rw (Shi-Malik [2]). $L_{r w} = D_{g}^{- 1} L .$ The matrix D_g is the degree matrix.
`'symmetric'`	Use the normalized symmetric Laplacian matrix L_s (Ng-Jordan-Weiss [3]). $L_{s} = D_{g}^{- 1 / 2} L D_{g}^{- 1 / 2} .$ The matrix D_g is the degree matrix.

'none'

Use Laplacian matrix L without normalization.

'randomwalk'

(Default) Use the normalized random-walk Laplacian matrix L_rw (Shi-Malik [2]).

$L_{r w} = D_{g}^{- 1} L .$

The matrix D_g is the degree matrix.

'symmetric'

Use the normalized symmetric Laplacian matrix L_s (Ng-Jordan-Weiss [3]).

$L_{s} = D_{g}^{- 1 / 2} L D_{g}^{- 1 / 2} .$

The matrix D_g is the degree matrix.

For more information, see Laplacian Matrix.

Example: 'LaplacianNormalization','randomwalk'

`ClusterMethod` — Clustering method
`'kmeans'` (default) | `'kmedoids'`

Clustering method to cluster the eigenvectors of the Laplacian matrix, specified as the comma-separated pair consisting of 'ClusterMethod' and either 'kmeans' or 'kmedoids'.

'kmeans' — Perform k-means clustering by using the kmeans function.
'kmedoids' — Perform k-medoids clustering by using the kmedoids function.

kmeans and kmedoids involve randomness in their algorithms. Therefore, to reproduce the results of spectralcluster, you must set the seed of the random number generator by using rng.

Example: 'ClusterMethod','kmedoids'

Output Arguments

collapse all

`idx` — Cluster indices
numeric column vector

Cluster indices, returned as a numeric column vector. idx has n rows, and each row of idx indicates the cluster assignment of the corresponding row (or observation) in X.

Data Types: double

`V` — Eigenvectors
numeric matrix

Eigenvectors, returned as an n-by-k numeric matrix. The columns of V are the eigenvectors corresponding to the k smallest eigenvalues of the Laplacian matrix. These eigenvectors are a low-dimensional representation of the input data X in a new space where clusters are more widely separated.

For well-separated clusters, the eigenvectors are indicator vectors. That is, the eigenvectors have values of zero (or close to zero) for points that do not belong to a given cluster, and nonzero values for points that belong to a particular cluster.

Data Types: single | double

`D` — Eigenvalues
numeric vector

Eigenvalues, returned as a k-by-1 numeric vector that contains the k smallest eigenvalues of the Laplacian matrix. The number of zero eigenvalues in D is an indicator of the number of connected components in the similarity graph and, therefore, is a good estimate of the number of clusters in your data.

Data Types: single | double

More About

collapse all

Similarity Graph

A similarity graph models the local neighborhood relationships between data points in X as an undirected graph. The nodes in the graph represent data points, and the edges, which are directionless, represent the connections between the data points.

If the pairwise distance Dist_i,j between any two nodes i and j is positive (or larger than a certain threshold), then the similarity graph connects the two nodes using an edge [1]. The edge between the two nodes is weighted by the pairwise similarity S_i,j, where $S_{i, j} = \exp (- {(\frac{D i s t_{i, j}}{σ})}^{2})$ , for a specified kernel scale σ value.

spectralcluster supports these two methods of constructing a similarity graph:

Nearest neighbor method (if 'SimilarityGraph' is 'knn'(default)): spectralcluster connects points in X that are nearest neighbors. You can use the 'NumNeighbors' and 'KNNGraphType' name-value pair arguments to specify the options for constructing the nearest neighbor graph.
- Use 'NumNeighbors' to specify the number of nearest neighbors.
- Use 'KNNGraphType' to specify whether to make a 'complete' or 'mutual' connection of points.
Radius search method (if 'SimilarityGraph' is 'epsilon'): spectralcluster connects points whose pairwise distances are smaller than a search radius. You must specify the search radius for nearest neighbors used to construct the similarity graph by using the 'Radius' name-value pair argument.

Similarity Matrix

A similarity matrix is a matrix representation of a similarity graph. The n-by-n matrix $S = {(S_{i, j})}_{i, j = 1, \dots, n}$ contains pairwise similarity values between connected nodes in the similarity graph. The similarity matrix of a graph is also called an adjacency matrix.

The similarity matrix is symmetric because the edges of the similarity graph are directionless. A value of S_i,j = 0 means that nodes i and j of the similarity graph are not connected.

Degree Matrix

A degree matrix D_g is an n-by-n diagonal matrix obtained by summing the rows of the similarity matrix S. That is, the ith diagonal element of D_g is $D_{g} (i, i) = \sum_{j = 1}^{n} S_{i, j} .$

Laplacian Matrix

A Laplacian matrix is one way of representing a similarity graph. The spectralcluster function supports the unnormalized Laplacian matrix, the normalized Laplacian matrix using the Shi-Malik method [2], and the normalized Laplacian matrix using the Ng-Jordan-Weiss method [3].

The unnormalized Laplacian matrix L is the difference between the degree matrix and the similarity matrix.

$L = D_{g} - S .$
The normalized random-walk Laplacian matrix (Shi-Malik) is defined as:

$L_{r w} = D_{g}^{- 1} L .$
To derive L_rw, solve the generalized eigenvalue problem $L v = λ D_{g} v$ , where v is a column vector of length n, and λ is a scalar. The values of λ that satisfy the equation are the generalized eigenvalues of the matrix $L_{r w} = D_{g}^{- 1} L .$
You can use the MATLAB^® function eigs to solve the generalized eigenvalue problem.
The normalized symmetric Laplacian matrix (Ng-Jordan-Weiss) is defined as:

$L_{s} = D_{g}^{- 1 / 2} L D_{g}^{- 1 / 2} .$

Use the 'LaplacianNormalization' name-value pair argument to specify the method to normalize the Laplacian matrix.

Tips

Consider using spectral clustering when the clusters in your data do not naturally correspond to convex regions.
From the spectral clustering algorithm, you can estimate the number of clusters k as:
- The number of eigenvalues of the Laplacian matrix that are equal to 0.
- The number of connected components in your similarity graph representation. Use graph to create a similarity graph from a similarity matrix, and use conncomp to find the number of connected components in the graph.
For an example, see Estimate Number of Clusters and Perform Spectral Clustering.

Algorithms

Spectral clustering is a graph-based algorithm for clustering data points (or observations in X). The algorithm involves constructing a graph, finding its Laplacian matrix, and using this matrix to find k eigenvectors to split the graph k ways. By default, the algorithm for spectralcluster computes the normalized random-walk Laplacian matrix using the method described by Shi-Malik [2]. spectralcluster also supports the unnormalized Laplacian matrix and the normalized symmetric Laplacian matrix which uses the Ng-Jordan-Weiss method [3]. spectralcluster implements clustering as follows:

For each data point in X, define a local neighborhood using either the radius search method or nearest neighbor method, as specified by the 'SimilarityGraph' name-value pair argument (see Similarity Graph). Then, find the pairwise distances $D i s t_{i, j}$ for all points i and j in the neighborhood.
Convert the distances to similarity measures using the kernel transformation $S_{i, j} = \exp (- {(\frac{D i s t_{i, j}}{σ})}^{2})$ . The matrix S is the similarity matrix, and σ is the scale factor for the kernel, as specified using the 'KernelScale' name-value pair argument.
Calculate the unnormalized Laplacian matrix L , the normalized random-walk Laplacian matrix L_rw, or the normalized symmetric Laplacian matrix L_s, depending on the value of the 'LaplacianNormalization' name-value pair argument.
Create a matrix $V \in ℝ^{n \times k}$ containing columns $v_{1}, \dots, v_{k}$ , where the columns are the k eigenvectors that correspond to the k smallest eigenvalues of the Laplacian matrix. If using L_s, normalize each row of V to have unit length.
Treating each row of V as a point, cluster the n points using k-means clustering (default) or k-medoids clustering, as specified by the 'ClusterMethod' name-value pair argument.
Assign the original points in X to the same clusters as their corresponding rows in V.

References

[1] Von Luxburg, U. “A Tutorial on Spectral Clustering.” Statistics and Computing Journal. Vol.17, Number 4, 2007, pp. 395–416.

[2] Shi, J., and J. Malik. “Normalized cuts and image segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 22, 2000, pp. 888–905.

[3] Ng, A.Y., M. Jordan, and Y. Weiss. “On spectral clustering: Analysis and an algorithm.” In Proceedings of the Advances in Neural Information Processing Systems 14. MIT Press, 2001, pp. 849–856.

Version History

Introduced in R2019b

spectralcluster

Syntax

Description

Examples

Perform Spectral Clustering on Input Data

Perform Spectral Clustering on Similarity Matrix

Cluster Using Radius Search for Similarity Graph

Find Eigenvalues and Eigenvectors of Laplacian Matrix

Input Arguments

`X` — Input data
numeric matrix

`S` — Similarity matrix
symmetric matrix

`k` — Number of clusters
positive integer

Name-Value Arguments

`Distance` — Distance metric
character vector | string scalar | function handle

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`Scale` — Scaling factors for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | numeric vector of nonnegative values

`SimilarityGraph` — Type of similarity graph
`'knn'` (default) | `'epsilon'`

`NumNeighbors` — Number of nearest neighbors
`log(size(X,1))` (default) | positive integer

`KNNGraphType` — Type of nearest neighbor graph
`'complete'` (default) | `'mutual'`

`Radius` — Search radius
nonnegative scalar

`KernelScale` — Scale factor
`1` (default) | `'auto'` | positive scalar

`LaplacianNormalization` — Method to normalize Laplacian matrix
`'randomwalk'` (default) | `'symmetric'` | `'none'`

`ClusterMethod` — Clustering method
`'kmeans'` (default) | `'kmedoids'`

Output Arguments

`idx` — Cluster indices
numeric column vector

`V` — Eigenvectors
numeric matrix

`D` — Eigenvalues
numeric vector

More About

Similarity Graph

Similarity Matrix

Degree Matrix

Laplacian Matrix

Tips

Algorithms

References

Version History

See Also

Topics

spectralcluster

Syntax

Description

Examples

Perform Spectral Clustering on Input Data

Perform Spectral Clustering on Similarity Matrix

Cluster Using Radius Search for Similarity Graph

Find Eigenvalues and Eigenvectors of Laplacian Matrix

Input Arguments

X — Input data numeric matrix

S — Similarity matrix symmetric matrix

k — Number of clusters positive integer

Name-Value Arguments

Distance — Distance metric character vector | string scalar | function handle

P — Exponent for Minkowski distance metric 2 (default) | positive scalar

Cov — Covariance matrix for Mahalanobis distance metric cov(X,'omitrows') (default) | positive definite matrix

Scale — Scaling factors for standardized Euclidean distance metric std(X,'omitnan') (default) | numeric vector of nonnegative values

SimilarityGraph — Type of similarity graph 'knn' (default) | 'epsilon'

NumNeighbors — Number of nearest neighbors log(size(X,1)) (default) | positive integer

KNNGraphType — Type of nearest neighbor graph 'complete' (default) | 'mutual'

Radius — Search radius nonnegative scalar

KernelScale — Scale factor 1 (default) | 'auto' | positive scalar

LaplacianNormalization — Method to normalize Laplacian matrix 'randomwalk' (default) | 'symmetric' | 'none'

ClusterMethod — Clustering method 'kmeans' (default) | 'kmedoids'

Output Arguments

idx — Cluster indices numeric column vector

V — Eigenvectors numeric matrix

D — Eigenvalues numeric vector

More About

Similarity Graph

Similarity Matrix

Degree Matrix

Laplacian Matrix

Tips

Algorithms

References

Version History

See Also

Topics

`X` — Input data
numeric matrix

`S` — Similarity matrix
symmetric matrix

`k` — Number of clusters
positive integer

`Distance` — Distance metric
character vector | string scalar | function handle

`P` — Exponent for Minkowski distance metric
`2` (default) | positive scalar

`Cov` — Covariance matrix for Mahalanobis distance metric
`cov(X,'omitrows')` (default) | positive definite matrix

`Scale` — Scaling factors for standardized Euclidean distance metric
`std(X,'omitnan')` (default) | numeric vector of nonnegative values

`SimilarityGraph` — Type of similarity graph
`'knn'` (default) | `'epsilon'`

`NumNeighbors` — Number of nearest neighbors
`log(size(X,1))` (default) | positive integer

`KNNGraphType` — Type of nearest neighbor graph
`'complete'` (default) | `'mutual'`

`Radius` — Search radius
nonnegative scalar

`KernelScale` — Scale factor
`1` (default) | `'auto'` | positive scalar

`LaplacianNormalization` — Method to normalize Laplacian matrix
`'randomwalk'` (default) | `'symmetric'` | `'none'`

`ClusterMethod` — Clustering method
`'kmeans'` (default) | `'kmedoids'`

`idx` — Cluster indices
numeric column vector

`V` — Eigenvectors
numeric matrix

`D` — Eigenvalues
numeric vector