How to search a substring in a list of strings?
60 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I have {'xx', 'abc1', 'abc2', 'yy', 'abc100'} and I would like to search 'abc' and get back {'abc1', 'abc2', 'abc100'}. Is it possible to do this in a simple way without a for cycle?
0 comentarios
Respuestas (4)
Jos (10584)
el 29 de En. de 2018
In recent releases you can use startsWith
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'}
tf = startsWith(A,'abc')
B = A(tf)
See the documentation on string functions for many other utilities that may be useful for you.
0 comentarios
Stephen23
el 29 de En. de 2018
Much faster than using cellfun or any string functions:
>> C = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
>> Z = C(strncmp(C,'abc',3));
>> Z{:}
ans = abc1
ans = abc2
ans = abc100
1 comentario
Jan
el 29 de En. de 2018
Editada: Jan
el 29 de En. de 2018
If you want to search at the start of the strings only, this is efficient:
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
B = s(strncmp(s, 'abc', 3));
Some timings:
% Some larger test data:
A = repmat({'xx', 'abc1', 'abc2', 'yy', 'abc100'}, 1, 1000);
S = string(A);
tic;
for k = 1:1000
tf = startsWith(A, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = startsWith(S, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = strncmp(s, 'abc', 3);
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = cellfun(@any,strfind(A, 'abc'));
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = ~cellfun('isempty', strfind(A, 'abc'));
B = A(tf);
end
toc
Elapsed time is 1.492006 seconds. % startsWith(cell string)
Elapsed time is 0.308345 seconds. % startsWith(string)
Elapsed time is 0.018157 seconds. % strncmp
Elapsed time is 8.095714 seconds. % cellfun(@any, strfind)
Elapsed time is 1.706694 seconds. % cellfun('isempty', strfind)
Note that cellfun method searches for the substring anywhere in the strings, while the two other methods search at the start only. With modern string methods this would be:
tf = contains(A, 'abc');
This has an equivalent speed as startsWith.
@MathWorks: strncmp is 17 times faster for cell strings than startsWith for strings. The conversion from cell strings to strings inside startsWith let it run 65 times slower than strncmp. There is a great potential for improvements.
Fangjun Jiang
el 29 de En. de 2018
s={'xx', 'abc1', 'abc2', 'yy', 'abc100'};
index=cellfun(@any,strfind(s,'abc'));
s(index)
2 comentarios
Jan
el 29 de En. de 2018
Editada: Jan
el 29 de En. de 2018
@Matt J: But cellfun is at least a fast C-mex function. Every function to process a cell string must contain a loop anywhere. The problem is using cellfun with an expensive anonymous function. About 4 times faster (but still slower than startsWith, timings see my answer):
index = ~cellfun('isempty', strfind(s, 'abc'));
Ver también
Categorías
Más información sobre Data Type Conversion en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!