Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

cluster

Construct agglomerative clusters from linkages

Sintaxis

T = cluster(Z,'cutoff',c)
T = cluster(Z,'cutoff',c,'depth',d)
T = cluster(Z,'cutoff',c,'criterion',criterion)
T = cluster(Z,'maxclust',n)

Description

T = cluster(Z,'cutoff',c) constructs clusters from the agglomerative hierarchical cluster tree, Z, as generated by the linkage function. Z is a matrix of size (m – 1)-by-3, where m is the number of observations in the original data. c is a threshold for cutting Z into clusters. Clusters are formed when a node and all of its subnodes have inconsistent value less than c. All leaves at or below the node are grouped into a cluster. t is a vector of size m containing the cluster assignments of each observation.

If c is a vector, T is a matrix of cluster assignments with one column per cutoff value.

T = cluster(Z,'cutoff',c,'depth',d) evaluates inconsistent values by looking to a depth d below each node. The default depth is 2.

T = cluster(Z,'cutoff',c,'criterion',criterion) uses the specified criterion for forming clusters, where criterion is 'inconsistent' (default) or 'distance'. The 'distance' criterion uses the distance between the two subnodes merged at a node to measure node height. All leaves at or below a node with height less than c are grouped into a cluster.

T = cluster(Z,'maxclust',n) constructs a maximum of n clusters using the 'distance' criterion. cluster finds the smallest height at which a horizontal cut through the tree leaves n or fewer clusters.

If n is a vector, T is a matrix of cluster assignments with one column per maximum value.

Ejemplos

contraer todo

Compare cluster assignments of flowers to their species classification.

Load the sample data.

load fisheriris

Compute three clusters of the Fisher iris data using the 'average' method and the 'chebychev' metric.

Z = linkage(meas,'average','chebychev');
c = cluster(Z,'maxclust',3);

Create a dendrogram plot of Z. To see the three clusters, use 'ColorThreshold' with a cutoff halfway between the third-from-last and second-from-last linkages.

cutoff = median([Z(end-2,3) Z(end-1,3)]);
dendrogram(Z,'ColorThreshold',cutoff)

Display the last two rows of Z to see how the three clusters are combined into one. linkage combines the 293rd (blue) cluster with the 297th (red) cluster to form the 298th cluster with a linkage of 1.7583. linkage then combines the 296th (green) cluster with the 298th cluster.

lastTwo = Z(end-1:end,:)
lastTwo = 2×3

  293.0000  297.0000    1.7583
  296.0000  298.0000    3.4445

See how the cluster assignments correspond to the three species. For example, one of the clusters contains 50 flowers of the second species and 40 flowers of the third species.

crosstab(c,species)
ans = 3×3

     0     0    10
     0    50    40
    50     0     0

Randomly generate sample data with 20,000 observations.

rng('default') % For reproducibility
X = rand(20000,3);

Create a hierarchical cluster tree using Ward's linkage. In this case, 'savememory' is set to 'on' by default. In general, choose the best value for 'savememory' based on the dimensions of X and the available memory.

Z = linkage(X,'ward');

Cluster data into four groups and plot the result.

c = cluster(Z,'Maxclust',4);
scatter3(X(:,1),X(:,2),X(:,3),10,c)

Consulte también

| | | |

Introducido antes de R2006a