Fast way to perform multiple searches on a large array
Mostrar comentarios más antiguos
I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
- I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel
8 comentarios
Jan
el 4 de Oct. de 2011
I assume "time" should be "times" inside the loop.
Nigel
el 4 de Oct. de 2011
Andrei Bobrov
el 4 de Oct. de 2011
What size of 'st'?
Nigel
el 4 de Oct. de 2011
Jan
el 4 de Oct. de 2011
Do the intervals [st(i):en(i)] overlap?
Nigel
el 4 de Oct. de 2011
Daniel Shub
el 4 de Oct. de 2011
Just to confirm times, st and en are all sorted?
Nigel
el 4 de Oct. de 2011
Respuesta aceptada
Más respuestas (2)
Jan
el 4 de Oct. de 2011
Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?
6 comentarios
Nigel
el 4 de Oct. de 2011
Jan
el 4 de Oct. de 2011
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
Nigel
el 4 de Oct. de 2011
Teja Muppirala
el 4 de Oct. de 2011
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
Daniel Shub
el 4 de Oct. de 2011
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
Nigel
el 4 de Oct. de 2011
Nigel
el 4 de Oct. de 2011
0 votos
2 comentarios
Bjorn Gustavsson
el 10 de Oct. de 2011
Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
Daniel Shub
el 10 de Oct. de 2011
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.
Categorías
Más información sobre Operators and Elementary Operations en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!