Find index with multiple condition, using find function

753 visualizaciones (últimos 30 días)
Shayma
Shayma el 21 de Sept. de 2016
Comentada: chakradhar Reddy Vardhireddy el 25 de Sept. de 2018
Hi all,
stuck again, search for solutions but with no help. I have large csv files(millions rows, 200 columns- text & numbers) that i could open with "datastore" (for now i work only on he first chunk), i want to create a new file with the whole rows that answer some conditions (by comparing only 4 columns which are <=, >= vector with 4 elements:min_range and max_range) so I wrote this:
ds= datastore(file_r);
new_data=table;
index=[];
while hasdata (ds)
datachunk= read (ds);
index= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
new_data=[new_data;datachunk(index,:);
with the line index i got the error message: Error using & Inputs must have the same size.
each vector has different elements and i'm looking for the intersection between the 4, because its an index i used "find" to look for the rows that match the 4 conditions, so how can i fix that??
if i split it :
z1= find (datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1));
z2= find (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2));
z3= find (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) ;
z4= find (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
z5=intersect(z4,intersect(intersect(z1,z2),z3))
it works, but then i have to rest the values in each run, which not seems to be beneficial way to do it
any help with that will be appreciated :)
  4 comentarios
Shayma
Shayma el 22 de Sept. de 2016
@George you are right! now it works @Stephen i manage to do it without this function, but thanks for your reply
chakradhar Reddy Vardhireddy
chakradhar Reddy Vardhireddy el 25 de Sept. de 2018
@shayma, could you suggest the method you used, where you didn't use the above function. I have a similar issue, your method may be helpful.

Iniciar sesión para comentar.

Respuesta aceptada

George
George el 22 de Sept. de 2016
The first thing I would try is to be more liberal with your use of parenthesis. In your statement:
index = find(datachunk.lip_acc >min_range(1) & datachunk.lip_acc<max_range(1)) & (datachunk.lip_don>min_range(2) & datachunk.lip_don<max_range(2)) & (datachunk.logP_o_w_>min_range(3) & datachunk.logP_o_w_<max_range(3)) & (datachunk.Weight>min_range(4) & datachunk.Weight<max_range(4));
you're closing the find after datachunk.lip_acc<max_range(1), and then logical anding it with the other statements. I think you want the entire statement encapsulated in find().

Más respuestas (1)

Steven Lord
Steven Lord el 21 de Sept. de 2016
I would avoid using find here. Write each of your conditions as separate logical arrays. When you need to index, combine those individual conditions with and, or, not, etc. This way if you encounter unexpected results you can set a breakpoint on the line where you perform the indexing and examine each individual condition to determine whether or not that logical array matches the rows you expect in your array.
M = magic(100);
largeEnough = M >= 40;
smallEnough = M <= 70;
result1 = M(largeEnough & smallEnough)
Once you have debugged your code, you may want to comment out the definition of those individual logical arrays and assemble the conditions all in one statement. If you do, I would consider splitting them among multiple lines for readability. In this example that's probably overkill because the conditions are so simple, but it's a good habit to develop for when your conditions aren't so simple.
result2 = M((M >= 40) & ...
(M <= 70))
isequal(result1, result2)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by