Evalclusters function gives different results for each running

Question

1 voto

Hi,

I am using evalclusters function to evaluate the number of clusters for kmeans cluster like this:

eval = evalclusters(data, 'kmeans', 'gap', 'klist', [1:10],'B', 50, 'SearchMethod', 'firstMaxSE');

However, each time I run the function, it gives different cluster numbers. I'm quite confused about this.

Could you please help me to explain this problem and do you know which parameters are

most suitable to set for this function (i.e., klist, number of reference data B, search method, reference distribution...) if I want to use gap criteria, for instance.

Thank you!

Cheers,

Ni

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

KAE el 28 de Feb. de 2018

I am getting different results each time too, but not too different (K=8, 9 or 10).

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Walter Roberson el 28 de Feb. de 2018

2 votos

https://www.mathworks.com/help/stats/kmeans.html#input_argument_namevalue_d119e436149

"Start: Method for choosing initial cluster centroid positions (or seeds), specified as the comma-separated pair consisting of 'Start' and 'cluster', 'plus', 'sample', 'uniform', a numeric matrix, or a numeric array, . This table summarizes the available options for choosing seeds."

Notice that all of the options in the table except the numeric matrix or numeric array involve random selection, which is going to have results that depend upon the state of the random number generator.

You have two choices:

You can provide the Start option and provide a numeric matrix or numeric array of exact initial cluster positions; or
You can set the random number generator to a consistent value each time

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Sina Baghali el 31 de En. de 2020

The 'Start' parametre is not valid in the evalclusters commnad. How can set the random number gernrator to a fixed value?

Walter Roberson el 1 de Feb. de 2020

Abrir en MATLAB Online

evalclusters permits you to pass a function handle as the method, which could be a call to kmeans with the Start parameter set.

The documentation for evalclusters shows an example of looping running kmeans for different cluster sizes and passing the results into evalclusters for analysis.

If you do either of the above two then you set the same starting point for each of the kmeans runs, and so be able to directly compare the effects of using different number of clusters for the same configuration.

Or you could use

rng(655321)

before each call to evalclusters(). If you do this then you will be able to replicate the evalclusters() results, but each of the individual kmeans calls will use a different set of starting centroids, which makes it more difficult to directly compare the effects of using a different number of clusters versus differences due to different starting points.

Iniciar sesión para comentar.

Evalclusters function gives different results for each running

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (1)

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Categorías

Etiquetas

Community Treasure Hunt

Evalclusters function gives different results for each running

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (1)

2 comentarios Mostrar Ninguno Ocultar Ninguno

Categorías

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno