Using Mahalanobis distance in hierarchical cluster analysis error

9 visualizaciones (últimos 30 días)
Sriparna Sen
Sriparna Sen el 4 de Mzo. de 2020
Comentada: Sriparna Sen el 12 de Mzo. de 2020
Hi! Thank you in advance for the help! I am currently creating a hierarchical cluser using the linkage function in Matlab. I pass the following argument into the function:
links = linkage(samples,'complete', 'mahalanobis');
My variable, samples, is a 25 x 106720 matrix, class double, that contains t values.
Every time I run this in Matlab however, it gives me the following error message:
Error using *
Requested 106720x106720 (84.9GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more
information.
Error in nancov>localcov (line 173)
c = xc' * xc / denom;
Error in nancov (line 116)
c = localcov(x,domle);
Error in pdist (line 181)
additionalArg = nancov(X);
Error in linkage (line 259)
Z = internal.stats.linkagemex(Y,method,pdistArg, memEff);
How do I bypass this error/ is there another way for me to calculate the mahalanobis distance for hierarchical clustering?

Respuestas (1)

Rajani Mishra
Rajani Mishra el 11 de Mzo. de 2020
The error encountered is because for your data “samples” of size 25 x 106720 when covariance matrix is computed in linkage function using “nancov()” the size grows to 106720 x 106720 which exceeds maximum array size preference.
You can try either reducing your data size by dimensionality reduction. I encountered literature talking about the same when researching about your question. You can also refer to literature regarding this. You can use function “pca()” for dimensionality reduction. Please refer to the following link to learn more about “pca()” : https://www.mathworks.com/help/stats/pca.html
Or, you can use tall arrays for storing data for hierarchical clustering. Tall arrays are designed for working with out-of-memory data. For more information refer : https://www.mathworks.com/help/stats/examples/statistics-and-machine-learning-with-big-data-using-tall-arrays.html

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by