How can I isolate data from a large input file?

6 visualizaciones (últimos 30 días)
Rob
Rob el 15 de Oct. de 2013
Comentada: dpb el 18 de Oct. de 2013
I have a number of large data files with approximately 8,000,000 rows and 10 columns. The data is taken from a train and monitors various inputs over a number of days. The 10th column indicates direction of the train with 1 and -1 for differing direction and 0 for when the train is at a standstill.
Each time the train changes direction I would like to be able to create a new variable that stores all the following data until the next direction change.
I am able to do this manually, by examining the data and finding the index where a direction change is indicated, i.e. 1 becomes -1. I would like to make a process that could automate this.
Any help would be greatly appreciated.
  1 comentario
Jan
Jan el 17 de Oct. de 2013
As usual, a short meaningful example would reveal the important details. Neither the meaning of the variables (Matlab does not if this is a train, a price or a temperature) not that it is the 1th column. So perhaps your question could be simplified to:
x = [1 1 1 0 1 0 0 -1 -1 1 0 -1 0 -1 0 1]
How can I find indices of changes from -1 to +1 and vice versa ignoring the zeros?

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 15 de Oct. de 2013
Editada: dpb el 15 de Oct. de 2013
I suggest not using a new variable but indexing into the one.
A very useful coding scheme easy to deal with.
To find the direction changes, use
ixdir=find(abs(diff(x(:,10))==2))+1; % all the points of direction change
The first direction section is from 1:ixdir(1); second is then ixdir(1):ixdir(2), etc., ... Processing those in sequence is quite easy with the indices w/o different variables.
  5 comentarios
Rob
Rob el 17 de Oct. de 2013
With the 0's removed the original solution works fine!
dpb
dpb el 17 de Oct. de 2013
Yeah, that was what I was working on the basis of...
There's gotta' be a way w/ the zeros included that's also pretty concise but at the moment the "trick" eludes me of the neatest way. I'm thinking if were to substitute +/-1 for the zero based on the sign preceding then the above works as well; I just haven't got a one-liner to do the substitution down yet.

Iniciar sesión para comentar.

Más respuestas (3)

sixwwwwww
sixwwwwww el 15 de Oct. de 2013
Dear Rob, here is the solution to your problem:
A = [0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 -1 0 0 1];
indx = [1 find(A)];
for i = 1:length(indx) - 1
B{i} = A(indx(i):indx(i + 1));
end
Now here replace A with your 10th column and it should work fine. Also here it is assumed that 1 and -1 appear in alternate fashion within 0s as you can see in the vector A. I hope it helps. Good luck!
  1 comentario
dpb
dpb el 15 de Oct. de 2013
Difficulty here is it'll be all moving irregardless of direction iiuc that all moving is either +/-1, not just the initial move.

Iniciar sesión para comentar.


Jan
Jan el 17 de Oct. de 2013
You can replace the zeros with the former value at first:
x = [1 1 1 0 1 0 0 -1 -1 1 0 -1 0 -1 0 1];
idx = (x ~= 0);
x2 = x(idx);
xf = x2(cumsum(idx));
Now strfind can look for [1, -1] and [-1, 1] in xf, or you can use diff(xf) and search there.

dpb
dpb el 18 de Oct. de 2013
It finally came to me!!! :)
Actually, was looking at it wrong -- to find the beginning of a movement you don't care which direction the move is in--only that it's a change from stopped.
Hence, the index you want is
idx==find(diff(abs(v))==1)+1; % all the points of start from stop
The direction is
sign(v(idx))
where v is the direction column in your data, of course.
This finds the first embedded location in the data; if the train is moving at the beginning of the data record that is discarded by the above as incomplete record. If you want that one, too, prepend a zero in front of the v vector before doing the diff() and then remove the +1 length correction.
  2 comentarios
Rob
Rob el 18 de Oct. de 2013
This looks very interesting, I will give it a whirl and be sure to let you know how it goes! thanks again
dpb
dpb el 18 de Oct. de 2013
Editada: dpb el 18 de Oct. de 2013
OK, one other caveat -- it does require there be at least one "stopped" measurement between the reversal of direction -- the above doesn't find the +/-2 points. I presumed that isn't possible owing to sample frequency as compared to the realizable direction reversal. If it is possible, "or" the abs(diff(...)==2 with the above before find() and you'll have both. Note that will have to keep the sign in this case as that case goes away with the abs().
That is, specifically,
find(diff([0 abs(v)])==1 | [0 abs(diff(v))==2])

Iniciar sesión para comentar.

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by