Cluster Visualization and Evaluation

Plot clusters of data and evaluate optimal number of clusters

Cluster analysis organizes data into groups based on similarities between the data points. Sometimes the data contains natural divisions that indicate the appropriate number of clusters. Other times, the data does not contain natural divisions, or the natural divisions are unknown. In such a case, you determine the optimal number of clusters to group your data.

To determine how well the data fits into a particular number of clusters, compute index values using different evaluation criteria, such as gap or silhouette. Visualize clusters by creating a dendrogram plot to display a hierarchical binary cluster tree. Optimize the leaf order to maximize the sum of the similarities between adjacent leaves. For grouped data with multiple measurements for each group, create a dendrogram plot based on the group means computed using a multivariate analysis of variance (MANOVA).

Live Editor Tasks

Cluster Data

Cluster data using k-means or hierarchical clustering in the Live Editor (Since R2021b)

Functions

expand all

Cluster Visualization

`dendrogram`	Dendrogram plot
`optimalleaforder`	Optimal leaf ordering for hierarchical clustering
`manovacluster`	Dendrogram of group mean clusters following MANOVA
`silhouette`	Silhouette plot

Cluster Evaluation

`evalclusters`	Evaluate clustering solutions
`addK`	Evaluate additional numbers of clusters
`compact`	Compact clustering evaluation object
`increaseB`	Increase reference data sets
`plot`	Plot clustering evaluation object criterion values

Objects

`CalinskiHarabaszEvaluation`	Calinski-Harabasz criterion clustering evaluation object
`DaviesBouldinEvaluation`	Davies-Bouldin criterion clustering evaluation object
`GapEvaluation`	Gap criterion clustering evaluation object
`SilhouetteEvaluation`	Silhouette criterion clustering evaluation object

Topics

Evaluate Optimal Number of Clusters
Identify the optimal number of clusters in a data set by using the evalclusters function.