Delete rows with characters in cell array
Mostrar comentarios más antiguos
I need some basic help. I have a cell array:
- 1 TITLE 13122423
- 2 NAME Bob
- 3 PROVIDER James
and many more rows with text...
- 44 234 456 234 345
- 45 324 346 234 345
- 46 344 454 462 435
and many MANY (>4000) more with only numbers
- 4100 text
- 4101 text
and more text and mixed entries
Now what I want is to delete all the rows where the first column contain a character, and end up with only those rows containing numbers. Row 44 - 46 in this example.
I tried to use
rawdataTruncated(strncmp(rawdataTruncated(:, 1), 'A', 1), :) = [];
but then i need to go throught the whole alphabet, right?
3 comentarios
the cyclist
el 17 de Ag. de 2017
Editada: the cyclist
el 17 de Ag. de 2017
Rather than describing what you have, it would be stupendously easier for us if you just defined a representative cell array with code, like
C = {'1 TITLE 13122423';
'44 234 456 234 345';
'4100 text'};
and then what you want the output to be.
So ... uh ... what would be output be from this array I just defined? It seems like maybe just the 2nd row.
Also, where you have written the numbers 1-3, 44-46, and 4100-4101, are those part of the cell array , or are you just indicating lines numbers here?
JB
el 17 de Ag. de 2017
Hi JB
are the numbers positive integers only?
or may it be the case that the input data is going to have lines like
1.
negative figures
'45 -65 2345 43'
2. decimals
'234 0.345 -2.4'
3.
fractions
'12 50/3456'
4.
line with numbers and operations
'35 45+67-3456/4356*3.2 '
Respuesta aceptada
Más respuestas (5)
the cyclist
el 17 de Ag. de 2017
I feel like there is a way to do this without resorting to cellfun, and just using regexp, but I'm drawing a blank. However, here is one way.
C = {'TITLE 13122423';
'234 456 234 345';
'text'};
cellfun(@(x)not(isempty(x)),regexp(C,'^[0-9][0-9 ]+'))
regexp is checking that each cell starts with a numeric character, and has only spaces and numbers after that.
4 comentarios
+1 Simpler:
>> idx = cellfun('isempty',regexp(C,'^[0-9][0-9 ]+'));
>> C(~idx)
ans =
'234 456 234 345'
wrong:
A line with '-' renders this single line attempt, like the Cyclist's, useless, because it returns null indices to the whole cell:
A={'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495- 3725'}
idx = cellfun('isempty',regexp(A,'^[0-9][0-9 ]+'))
idx =
9×1 logical array
0
0
0
0
0
0
0
0
0
@John BG: luckily regular expressions are trivial to alter, as long as the requirements are clearly specified. Note that the original question does not show - in its example, and JB states "...end up with only those rows containing numbers", so we would have to ask JB exactly what characters are allowed in the output numbers: I certainly don't know what data JB is working with. Do you?
>> idx = cellfun('isempty',regexp(A,'^[0-9][-0-9 ]+$'));
>> A(~idx)
ans =
'44 234 456 234 345'
'48 344 454 462 435'
'4103 0495- 3725'
Note that using regexp much more efficient (in terms of total coding, debugging, and running time) than trying to write your own string parser, as some beginners would try to do
Guillaume
el 18 de Ag. de 2017
The simple addition of a $ at the end of the regular expression is all that is needed to fix the problem.
Guillaume
el 18 de Ag. de 2017
As said, ignore attempts to build your own custom string parser, it's inefficient and a complete waste of time. Use a regular expression:
This one works, John BG can attempt to find flaws in it to his heart content:
A = {'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495- 3725'}
isjustnumbers = ~cellfun('isempty', regexp(A, '^[0-9][0-9 ]+$'));
filteredA = A(isjustnumbers)
5 comentarios
Stephen23
el 18 de Ag. de 2017
@Guillaume: +1 I don't think a simple $ character has ever entertained me so much. Thank you!
John BG
el 18 de Ag. de 2017
yes, keep patching
Stephen23
el 18 de Ag. de 2017
@John BG: are you suggesting that you own code is perfect, and never needs to be fixed?
s1=s1+1; $ shift pointer while ' ' or number
???
Jan
el 18 de Ag. de 2017
+1. It works and is elegant.
José-Luis
el 18 de Ag. de 2017
+1. Simple and efficient.
Tongue in cheek: That moment when Matlab answers starts feeling like a battlefield...
If you want to remove all lines, which contain a non-digit and non-white-space:
index = cellfun(@(ac) all(isstrprop(ac, 'digit') | ...
isstrprop(ac, 'wspace')), C);
C = C(index)
A loop is faster than cellfun:
B = true(size(A));
for k = 1:numel(A)
B(k) = ~all(isstrprop(A{k}, 'digit') | ...
isstrprop(A{k}, 'wspace'));
end
A(B) = [];
2 comentarios
Alan Peters
el 19 de Ag. de 2017
I love your profile picture, Jan. I think I share that expression half the time I'm trying to solve a problem with Matlab!
Walter Roberson
el 19 de Ag. de 2017
(Jan used to have a different profile picture. He also used to get a lot of email requests from people. One day I teased him that it was because he had such a handsome and elegant profile picture. Jan changed his picture to the funny face you see now. The number of email requests he got dropped considerably. ;-) )
Hi JB
thanks for pointing out the code supplied by gnovice.
I just tried it and, with gnovice characters like '@' are taken as numbers.
Let's say that there's s line like
'4103 0495@ 3725'
.
then
.
A={'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495@ 3725'}
index = ~any(cellfun(@any, isstrprop(A, 'alpha')), 2); C = A(index, :)
=
3×1 cell array
'44 234 456 234 345'
'48 344 454 462 435'
'4103 0495@ 3 725'
My answer does only take into account numbers, as requested
A={'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495@ 3725'}
B=[]; % log to record locations of lines to delete
L0='0123456789';
L1=' 0123456789';
for k=1:1:length(A)
L=A{k,:}
s1=1;while L(s1)~=' ' % start from left until find ' '
s1=s1+1;
end
while L(s1)==' ' % in case more than one consecutive ' '
s1=s1+1
end
while s1<length(L) && ~isempty(strfind(L1,L(s1)))
s1=s1+1; % shift pointer while ' ' or number
end
if isempty(strfind(L0,L(s1))) % check
B=[B k];
end
end
A(B)=[]
=
2×1 cell array
'44 234 456 234 345'
'48 344 454 462 435'
.
If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?
To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link
thanks in advance
John BG
9 comentarios
JB
el 18 de Ag. de 2017
@JB, you'd be better off forgetting about this solution altogether, it's extremely long-winded, inefficient and unnecessarily complicated. It's basically a poorly formed attempt at building a string parser, when matlab already has built-in ones
John BG
el 18 de Ag. de 2017
Editada: John Kelly
el 1 de Sept. de 2017
Hi JB
where
s1=s1+1; $ shift pointer while ' ' or number
is
s1=s1+1; % shift pointer while ' ' or number
typing error, sorry.
look what happens when one if the lines contains the character '-'
A={'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495- 3725'}
cellfun(@(x)not(isempty(x)),regexp(A,'^[0-9][0-9 ]+'))
ans =
9×1 logical array
1
1
1
1
1
1
1
1
1
It returns the complete cell!
John BG
el 18 de Ag. de 2017
Editada: John Kelly
el 1 de Sept. de 2017
A={'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
'48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495- 3725'}
idx = cellfun('isempty',regexp(A,'^[0-9][0-9 ]+'))
idx =
9×1 logical array
0
0
0
0
0
0
0
0
0
John BG
el 18 de Ag. de 2017
Hi JB
the single line approach eliminates lines that contain numbers only but with a preceding ' ',
like 5th line ' 48 344 454 462 435'.
A = {'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
' 48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495 3725'}
cellfun('isempty', regexp(A, '^[0-9][0-9 ]+$'))
ans =
9×1 logical array
0
0
0
1
0
0
0
0
1
.
the question reads
' .. and end up with only those rows containing numbers'
The code that I have supplied does not skip lines that as requested have numbers only, but that have spaces ahead of numbers.
A = {'1 TITLE 13122423';
'2 NAME Bob';
'10 PROVIDER James';
'44 234 456 234 345';
' 48 344 454 462 435';
'4100 text';
'4101 text';
'4102 2more text';
'4103 0495 3725'}
L0='0123456789';
L1=' 0123456789';
for k=1:1:length(A)
L=A{k,:}
s1=1;while L(s1)~=' ' % start from left until find ' '
s1=s1+1;
end
while L(s1)==' ' % in case more than one consecutive ' '
s1=s1+1
end
while s1<length(L) && ~isempty(strfind(L1,L(s1)))
s1=s1+1; % shift pointer while ' ' or number
end
if isempty(strfind(L0,L(s1))) % check
B=[B k];
end
end
A(B)=[]
..
A =
3×1 cell array
'44 234 456 234 345'
' 48 344 454 462 435'
'4103 0495 3725'
regards
Stephen23
el 18 de Ag. de 2017
Editada: John Kelly
el 1 de Sept. de 2017
In case anyone else cares:
>> idx = cellfun('isempty',regexp(A,'^[0-9 ]+$'));
>> A(~idx)
'44 234 456 234 345'
' 48 344 454 462 435'
'4103 0495 3725'
Jan
el 18 de Ag. de 2017
Editada: John Kelly
el 1 de Sept. de 2017
The handling leading spaces in your code is not efficient, but this problem does not occur in the original question. The two loops while L(s1)~=' ' and while L(s1)==' ' are fragile: they fail, if the string does not contain any space, or if only spaces are following an initial sequence of non-space characters. This loop version could be simplified:
B = true(size(A));
L = ' 0123456789';
for k = 1:numel(A)
B(k) = ~all(ismember(A{k}, L));
end
A(B) = [];
But I prefer regexp and isstrprop methods.
Steven Lord
el 18 de Ag. de 2017
I recommend everyone except the original poster JB take a step back from this question for a little while. Remember that one of the tips for a helpful answer is "Be honest and considerate with all responses to all contributors." and a little breather might help make that easier.
JB, please add a comment to the original question indicating if you are satisfied with a combination of one or more of the responses you've received. If you aren't, clarify what aspect of the problem has not yet been solved. If you feel one of the answers was most useful in solving the problem, consider accepting it.
John BG
el 18 de Ag. de 2017
Steven Lord
Your intervention is highly appreciated.
Too many comments often clutter the answers, and it may be the case that the question originator, here JB, decides just to walk away from such discussions, sometimes diverging to speed and code compactness accessory considerations, while it's not yet know whether JB finds any useful code at all, so far.
Awaiting JB response
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!