K-mean for Wine data set

Question

1 voto

Hi,

I performed a K-mean algorithm command on the wine data set from UCI respiratory. This dataset contains chemical analysis of 178 wines, derived from three different cultivars. Wine type is based on 13 continuous features.

Here's the command load 'wine_data.txt';

[IDX,C,sumd,D] = kmeans(wine_data,3,... 'start','sample',... 'Replicates',100,... 'maxiter',1000, 'display','final');

The final Best total sum of distances is 2.37069e+06. This result is way far from the reported K-means solution from the literature, which is aournd 18,061. Is the K-mean solution of Matlab stuck in local minima? Please advice. Thanks.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

the cyclist el 27 de Ag. de 2013

For anyone who is interested in helping out on this one, the data set is here: http://archive.ics.uci.edu/ml/datasets/Wine

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Shashank Prasanna el 27 de Ag. de 2013

0 votos

Ganesh, what distance metric does the 'literature' use?

The kmeans default is 'sqEuclidean'. You have to make sure you are comparing the same metric. Try changing it to cityblock or any of the other options:

http://www.mathworks.com/help/stats/kmeans.html

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Ganesh el 27 de Ag. de 2013

0 votos

Thanks for the reply Shashank The literature used 'sqEuclidean' and so did I.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

tryhard el 29 de Ag. de 2013

Could you post a link to the relevant article. I get the same result you do. It seems like they might have performed pre-processing on the data of some sort.

Iniciar sesión para comentar.

Answer 3

gheorghe gardu el 1 de Nov. de 2015

0 votos

I would like to ask if you could post the Matlab code that you have used ? I would like to thank you in advance.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 4

Paul Munro el 21 de Feb. de 2023

0 votos

The large distance sum you report makes me think that you did not rescale the data. Variable 13 is in the thousands and will overwhelm the effect of the other variables. You will probably get better results if you rescale the variables separately (Z scoring for example).

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

K-mean for Wine data set

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (4)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Community Treasure Hunt

K-mean for Wine data set

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (4)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos