How do I use regexp to extract text between numbers

1 visualización (últimos 30 días)
Ean Hendrickson
Ean Hendrickson el 9 de Nov. de 2019
Editada: per isakson el 9 de Nov. de 2019
I have a string that I extracted from a pdf
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
How can I use the function regexp to extract all the descriptions between the numbers to put them into a 16x1 matrix. So the end product I want will be a 16x1 string that looks like
  1. Receptacles, general purpose.
  2. Receptacles with integral GFCI.
  3. USB Charger receptacles.
  4. AFCI receptacles.
  5. Twist-locking receptacles.
  6. Isolated-ground receptacles.
  7. Tamper-resistant receptacles.
  8. Weather-resistant receptacles.
  9. Pendant cord-connector devices.
  10. Cord and plug sets.
  11. Wall box dimmers.
  12. Wall box dimmer/sensors.
  13. Wall box occupancy/vacancy sensors.
  14. Toggle Switches.
  15. Floor service outlets.
  16. Associated device plates.
I also have this line of code
parts = regexp(str,'^\d*+.*$','dotexceptnewline','lineanchors');
which finds the index of each number in the string. I think I could then use all the index values to write a for loop to extract the text that is in between the text
  4 comentarios
Rik
Rik el 9 de Nov. de 2019
Is this the exact text of your char array? Or are there actually some char(10) in there?
Ean Hendrickson
Ean Hendrickson el 9 de Nov. de 2019
this is the exact text I extracted from a pdf. there should be no char(10) in there. I used extractFileText, strfind and extractBetween to get the above text.

Iniciar sesión para comentar.

Respuestas (2)

per isakson
per isakson el 9 de Nov. de 2019
Editada: per isakson el 9 de Nov. de 2019
"So the end product I want will be a 16x1 string that looks like" I'm not sure exactly how understand your requirement.
The problem is the delimiter that looks a bit like the character on my ENTER key ( ↵). After copy&paste from your question the hex number of that character is \x21B5.
Try
%%
z = regexp( str, "\x21B5+", 'split' );
z = strtrim( z );
z( isstring(z) & strlength(z)==0 ) = [];
%%
% z = regexp( z, "(?<=\d+\.\x20).+$", 'match', 'once' ); % removes the numbers
out = reshape( z, [],1 );
%%
fprintf( 1, '%s\n', out );
outputs in the command window
1. Receptacles, general purpose.
2. Receptacles with integral GFCI.
3. USB Charger receptacles.
4. AFCI receptacles.
5. Twist-locking receptacles.
6. Isolated-ground receptacles.
....
and
>> out(1:4)
ans =
4×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."

JESUS DAVID ARIZA ROYETH
JESUS DAVID ARIZA ROYETH el 9 de Nov. de 2019
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
parts = regexp(str,'\d+\. +[.\w,-/\s]+\.','match')'
parts =
16×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."
"5. Twist-locking receptacles."
"6. Isolated-ground receptacles."
"7. Tamper-resistant receptacles."
"8. Weather-resistant receptacles."
"9. Pendant cord-connector devices."
"10. Cord and plug sets."
"11. Wall box dimmers."
"12. Wall box dimmer/sensors."
"13. Wall box occupancy/vacancy sensors."
"14. Toggle Switches."
"15. Floor service outlets."
"16. Associated device plates."

Categorías

Más información sobre String Parsing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by