how to search for multiple words anywhere in the sentence ?

11 visualizaciones (últimos 30 días)
Amr Hashem
Amr Hashem el 19 de Sept. de 2015
Comentada: Cedric el 22 de Sept. de 2015
I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?
  2 comentarios
Amr Hashem
Amr Hashem el 19 de Sept. de 2015
where is per isakson comment ...!!
Amr Hashem
Amr Hashem el 19 de Sept. de 2015
Is there any function do it instead of (reqexpi) ?

Iniciar sesión para comentar.

Respuestas (3)

the cyclist
the cyclist el 19 de Sept. de 2015
The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.
  2 comentarios
Amr Hashem
Amr Hashem el 19 de Sept. de 2015
thanks for your idea , but that's waste more time
Amr Hashem
Amr Hashem el 20 de Sept. de 2015
thanks to you all...
I take your advice "to do the regexp search three times, once for each word"
and try this:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
but I am still thinking, may be the three words not exist in the same cell.
I mean 126= battery 127: power 128= failure
but overall the code now sounds good :)

Iniciar sesión para comentar.


per isakson
per isakson el 19 de Sept. de 2015
Editada: per isakson el 20 de Sept. de 2015
Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
&nbsp
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end
  12 comentarios
per isakson
per isakson el 20 de Sept. de 2015
I added a new code to my answer.
Amr Hashem
Amr Hashem el 20 de Sept. de 2015
I noticed ... thanks

Iniciar sesión para comentar.


Amr Hashem
Amr Hashem el 20 de Sept. de 2015
that's work:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
  1 comentario
Cedric
Cedric el 22 de Sept. de 2015
This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )

Iniciar sesión para comentar.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by