Finding Likely Duplicate Strings

Question

0 votos

I have an existing database of contact information for various contacts at specified offices across the country (a "lead" list if you will). This database contains information such as first name, last name, etc. In an effort to refresh the database with current information, I have done some manual research and data logging and have compiled a new, separate data set of current contact information for contacts at the same specified offices.

When updating the existing database with the new data, I've noticed that I'm creating "duplicate" contact records quite a bit. The updating algorithm simply looks for an exact match when it references the contact's name in the new, current data set against the contact's name in the old, existing database. The algorithm thinks "Gregory Smith" is not currently in the database because there isn't an exact match, but upon closer inspection "Gregory" IS already in the database as "Greg Smith".

Instead of manually looking through the database as I update the data and "de-duping" things myself, I was wondering if there was a Matlab function that can compare 2 strings and return how likely it is that they're the same. For example, having the computer flag "Gregory Smith" when the database currently has "Greg Smith" in it. Having the computer do this type of preprocessing would save a lot of time. Any help would be greatly appreciated. Thanks.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Zachary Messaglia el 7 de Mayo de 2018

Were you able to solve this?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Jan el 12 de Mzo. de 2014

0 votos

It is a good strategy to search in the FileExchange at first:

http://www.mathworks.com/matlabcentral/fileexchange/index?utf8=%E2%9C%93&term=String+distance

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Finding Likely Duplicate Strings

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Finding Likely Duplicate Strings

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos