How does the 'ward linkage' during cluster analysis work?

9 visualizaciones (últimos 30 días)

Franziska Ba el 2 de Dic. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/494323-how-does-the-ward-linkage-during-cluster-analysis-work

I have the following problem: I would like to examine my data with a cluster analysis.

As distance measure (similarity measure) I use "correlation". As 'Linkage' I use 'ward' because it’s best known for grouping the “real” clusters (I know that you should actually use 'ward' with 'euclidean'). Furthermore, the ‘ward’ should not unite those objects that have the smallest distance from each other but the objects that least increase a given variance criterion.

cgObj = clustergram(data(2:264,:),'Standardize',2 ,'Colormap','jet','RowPDist','correlation', ...
    'ColumnPDist','correlation' ,'Linkage','ward','DisplayRange',1, 'Symmetric',1, 'Cluster',1);

Now I have checked the theoretic procedure of the code with a simple example.

First, the similarity is quantified for each object pair and buffered in a (non-visible) distance matrix. After that, the first two objects which increase the variance the least should be grouped. Thereafter, the similarities between the newly created group and the remaining objects are re-quantified. This is followed by a new grouping step, and so on.

As far as I understand it, the variance that should increase as little as possible is not calculated from the distance matrix but from the output matrix.

Why is the distance matrix calculated if it is not used in 'linkage' anyway? Or have I thought incorrectly about that? I would like to understand the exact procedure of grouping during cluster analysis.

I am grateful for any suggestion!