Documentación

Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

# silhouette

## Sintaxis

```silhouette(X,clust) s = silhouette(X,clust) [s,h] = silhouette(X,clust) [...] = silhouette(X,clust,metric) [...] = silhouette(X,clust,distfun,p1,p2,...) ```

## Description

`silhouette(X,clust)` plots cluster silhouettes for the n-by-p data matrix `X`, with clusters defined by `clust`. Rows of `X` correspond to points, columns correspond to coordinates. `clust` can be a categorical variable, numeric vector, character matrix, string array, or cell array of character vectors containing a cluster name for each point. `silhouette` treats `NaN`s and empty values in `clust` as missing values, and ignores the corresponding rows of `X`. By default, `silhouette` uses the squared Euclidean distance between points in `X`.

`s = silhouette(X,clust)` returns the silhouette values in the n-by-1 vector `s`, but does not plot the cluster silhouettes.

`[s,h] = silhouette(X,clust) ` plots the silhouettes, and returns the silhouette values in the n-by-1 vector `s`, and the figure handle in `h`.

`[...] = silhouette(X,clust,metric)` plots the silhouettes using the inter-point distance function specified in `metric`. Choices for `metric` are given in the following table.

MetricDescription
`'Euclidean'`

Euclidean distance

`'sqEuclidean'`

Squared Euclidean distance (default)

`'cityblock'`

Sum of absolute differences

`'cosine'`

One minus the cosine of the included angle between points (treated as vectors)

`'correlation'`

One minus the sample correlation between points (treated as sequences of values)

`'Hamming'`

Percentage of coordinates that differ

`'Jaccard'`

Percentage of nonzero coordinates that differ

Vector

A numeric distance matrix in upper triangular vector form, such as is created by `pdist. X` is not used in this case, and can safely be set to `[]`.

For more information on each metric, see Distance Metrics.

`[...] = silhouette(X,clust,distfun,p1,p2,...)` accepts a function handle `distfun` to a metric of the form

`d = distfun(X0,X,p1,p2,...)`

where `X0` is a `1`-by-`p` point, `X` is an `n`-by-`p` matrix of points, and `p1,p2,...` are optional additional arguments. The function `distfun` returns an `n`-by-`1` vector `d` of distances between `X0` and each point (row) in `X`. The arguments `p1`, `p2`,`...` are passed directly to the function `distfun`.

## Ejemplos

contraer todo

Create a silhouette plot from clustered data.

Generate random sample data.

```rng default % For reproducibility X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];```

Cluster the data in `X` using `kmeans`.

`cidx = kmeans(X,2);`

Create a silhouette plot from the clustered data.

`silhouette(X,cidx)` Compute the silhouette values from clustered data.

Generate random sample data.

```rng default % For reproducibility X = [randn(10,2)+ones(10,2);randn(10,2)-ones(10,2)];```

Use `kmeans` to cluster the data in `X` based on the sum of absolute differences in distance.

`cidx = kmeans(X,2,'distance','cityblock');`

Compute the silhouette values from the clustered data. Specify `metric` as `'cityblock'` to indicate that the `kmeans` clustering is based on the sum of absolute differences.

`s = silhouette(X,cidx,'cityblock')`
```s = 20×1 0.0816 0.5848 0.1906 0.2781 0.3954 0.4050 0.0897 0.5416 0.6203 0.6664 ⋮ ```

## Más acerca de

contraer todo

### Silhouette Value

The silhouette value for each point is a measure of how similar that point is to points in its own cluster, when compared to points in other clusters. The silhouette value for the `i`th point, `Si`, is defined as

`Si = (bi-ai)/ max(ai,bi)`

where `ai` is the average distance from the `i`th point to the other points in the same cluster as `i`, and `bi` is the minimum average distance from the `i`th point to points in a different cluster, minimized over clusters.

The silhouette value ranges from -1 to +1. A high silhouette value indicates that `i` is well-matched to its own cluster, and poorly-matched to neighboring clusters. If most points have a high silhouette value, then the clustering solution is appropriate. If many points have a low or negative silhouette value, then the clustering solution may have either too many or too few clusters. The silhouette clustering evaluation criterion can be used with any distance metric.

## References

 Kaufman L., and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: John Wiley & Sons, Inc., 1990.

Download ebook