How to cluster similar strings?

9 visualizaciones (últimos 30 días)
Serbring
Serbring el 26 de En. de 2020
Comentada: Serbring el 29 de En. de 2020
Hi all,
I have long lists of strings which I have automatically collected with a brute web scraping routine. However, many strings are pretty similar and I would like to reduce the length of the list by showing only the really different names. Is there any way, cluster together the strings? Below, you will find a sample of the list.
Thank you so much.
Best regards.
{'microbiologia agraria' }
{'microbiologia forestale e ambientale' }
{'microbiologia generale' }
{'microbiologia agraria' }
{'microbiologia generale e ambientale' }
{'microbiologia del suolo e del sottosuolo' }
{'nutrition and health: the functional foods'}
{'microbiologia generale e ambientale' }
{'microbial biotechnologies in agroforestry' }
{'microbiologia generale ed ambientale' }
{'microbiologia agraria e forestale' }

Respuestas (1)

Image Analyst
Image Analyst el 26 de En. de 2020
  1 comentario
Serbring
Serbring el 29 de En. de 2020
Thanks for your reply. I already knew those distances, but the real problem is how to deal with those number. I will try to be more specific, so that you will understand the basic idea of the algorithm I have developed.
Let's assume, I have three strings A, B and C. I computed the pair-wise distance between the strings (so:A - B, A-C, B-C), and then I summed the distance of one string with the other two (so A-B and A-C for A). Then, I don't have any idea on how to deal with those number. Any suggestion is appreciate.
Cheers
Michele

Iniciar sesión para comentar.

Categorías

Más información sobre Logical en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by