Problems clustering with kmeans

Question

0 votos

I have a data set that is x,y,v. Each vector is quite long, ~2 million rows. Also, the data is scattered, i.e. not at regular x,y intervals/spacings. An x-y plot of a small section is included here. When plotting x-y you can see groupings of 9 data points each. There will be ~200K of these groupings. I need to cluster the data set and get an average v value for each cluster of 9 points.

I have used kmeans clustering:

[idx,CC] = kmeans([x,y],round(length(x)/9),'Replicates',10,'Options',statset('UseParallel',1));

This works for maybe 66% of the clusters. In other words, about 66% of the clusters it finds are composed of 9 data points as desired. The other clusters are somewhat smaller or larger in the amount of data points comprising them. Also attached is a histogram of the amounts of data points in the clusters that kmeans returns. The large peak at 9 is what I want and is 66% of the total values...

If I know how many points should be in each cluster (9) and I know the intra-point distance within the cluster (because the spacing of the 9 data points within a cluster is constant), is there a way to improve upon these results of kmeans? Can I stipulate that a cluster must have 9 points within it? Can I stipulate that a cluster can have a distance between its members that is no more than a prescribed value?

Other things I'm considering is looping over a kmeans calculation and each time just keeping the clusters that have the 9 points--repeating the kmeans for the data set with those entries removed. This may work but it seems there should be a better way considering I know a decent amount about the data structure.

Thanks for any suggestions!

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

the cyclist el 3 de Sept. de 2022

Editada: the cyclist el 3 de Sept. de 2022

Glad it worked out for you.

I'm curious what inputs worked for you, using dbscan (and if success seemed dependent on getting them right).

Paul Safier el 3 de Sept. de 2022

@the cyclist. I used 9 for minpts since I know the clusters should have 9 points in them all. For epsilon, I iterated until the output found the correct amount of clusters. I used a smaller test clip for this and I knew from inspecting a plot how many clusters I needed to get. The value was 2.4276e-5. My clusters are all the same size so I may have an easier-than-normal problem. Thanks again.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

the cyclist el 1 de Sept. de 2022

0 votos

I think you might have success if you try the DBSCAN algorithm instead.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Image Analyst el 3 de Sept. de 2022

0 votos

dbscan_demo.m

See my attached dbscan demo.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Paul Safier el 3 de Sept. de 2022

@Image Analyst. I will look this over. Many years back I found another of your demos quite useful. Thanks for making them!

Iniciar sesión para comentar.

Problems clustering with kmeans

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Problems clustering with kmeans

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos