Borrar filtros
Borrar filtros

Extract email addresses from text

7 visualizaciones (últimos 30 días)
Gustav Essunger
Gustav Essunger el 8 de Jun. de 2017
Comentada: Adam Danz el 2 de Jun. de 2021
Hello everyone!
Does anyone have a script (or know how to create one) that extracts email addresses from a text string?
(With similar function as this website: http://www.procato.com/mailextract/)
Thanks in advance! Gustav
  1 comentario
Steven Lord
Steven Lord el 9 de Jun. de 2017
It's likely more complicated than you think. See for example this Microsoft blog and this Stack Overflow question. One of the answers on the Stack Overflow page links to a (five year old) page giving a regular expression that I suspect you could use with the regular expression functionality in MATLAB.

Iniciar sesión para comentar.

Respuestas (2)

Stephen23
Stephen23 el 8 de Jun. de 2017
Editada: Stephen23 el 2 de Jun. de 2021
Take MATLAB's own Regular Expressions example:
email = '[a-z_]+@[a-z]+\.(com|net)';
and adapt it to allow any domain, or whatever other requirements you have:
rgx = '[a-z0-9_]+@[a-z0-9]+(\.[a-z0-9]+)+';
C = regexpi(txt,rgx,'match');
For a slightly stricter version, you can find many regular expressions on the internet, e.g.:
rgx = '[a-z0-9._%+-]+@[a-z0-9-]+(\.[a-z0-9-]+)+'
While this simple regular expression works for simple email adresses, it is worth noting that the complete rules for checking valid email adresses are not trivial to implement with a regular expression:
A common mistake (including by this answer) is to exclude non-latin characters.
  2 comentarios
Gustav Essunger
Gustav Essunger el 9 de Jun. de 2017
Thank you very much!!
Adam Danz
Adam Danz el 2 de Jun. de 2021
Don't forget Mailojis! 😁

Iniciar sesión para comentar.


oliver
oliver el 15 de Feb. de 2019
Editada: oliver el 15 de Feb. de 2019
I think the above examples will miss quite a lot of emails, like all those containing a capital letter or things like: Peter.O'Toole@xyz.com. So my suggestion would be something like:
reg='[a-zA-Z0-9._%''+-]+@([a-zA-Z0-9._-])+\.([a-zA-Z]{2,4})';
Although with the wide variety of new TLDs nowadays, limiting the last character group to 2-4 letters may be obsolete (depending on your needs).
  1 comentario
Daniele Lupo
Daniele Lupo el 2 de Jun. de 2021
This regexp validates an invalid mail with consecutive dots, like "my.email@email...com".

Iniciar sesión para comentar.

Categorías

Más información sobre Get Started with MATLAB en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by