Vectorize a loop to save time
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Filip
el 3 de Feb. de 2019
Comentada: Walter Roberson
el 4 de Feb. de 2019
I have a big data set and my current code takes 2 hours. I am hoping to save time by vectorization if that is possible in my case.
I have a table Table with variables ID, t1, tend, p. My code is sth like:
x=zeros(size(Table.ID,1));
for i=1:size(Table.ID,1)
x(i)=sum(Table.t1<Table.t1(i) & Table.tend>Table.tend(i) & abs(Table.p-Table.p(i))>1);
end
So for each observation, I want to find number of observations that start before, ends after and have a p value in the neighborhood of 1. It takes 2 hours to run this loop. Any suggestion?
Thanks in advance!
2 comentarios
Walter Roberson
el 4 de Feb. de 2019
How are the t1 and tend values arranged? Are tend(i+1) = t1(i) such that together they partition into consecutive ranges that are completely filled between the first and last? Do they act to partition into non-overlapping ranges but with gaps? Are there overlapping regions? Are the boundaries already sorted?
Respuesta aceptada
Jan
el 4 de Feb. de 2019
Editada: Jan
el 4 de Feb. de 2019
2 hours sounds long. Is the memory exhausted and the virtual memory slows down the execution? How large is the input?
Is this a typo:
x = zeros(size(Table.ID,1))
It creates a square matrix, but you access it as vector obly.
Does the table access need a remarkable amount of time?
n = size(Table.ID,1);
t1 = Table.t1;
tend = Table.tend;
p = Table.p;
x = zeros(n, 1);
for i = 1:n
x(i) = sum(t1 < t1(i) & tend > tend(i) & abs(p - p(i)) > 1);
end
If you sort one of the vectors, you could save some time:
[t1s, index] = sort(t1);
tends = tend(index);
ps = p(index);
for i = 2:n
m = t1s < t1s(i);
x(i) = sum(tends(m) > tends(i) & ...
abs(ps(m) - ps(i)) > 1);
end
Afterwards x has to be sorted inversly. If you provide some inputs, I could check the code before posting. I'm tired, perhaps I've overseen an obvious indexing error.
Is the shown code really the bottleneck of the original code?
Más respuestas (1)
Walter Roberson
el 4 de Feb. de 2019
My mind is headed towards creating a pairwise mask matrix,
M = squareform(pdist(Table.p) > 1); %important that Table.p is a column vector
That would be comparatively fast. If the table is very big then it could fill up memory, though.
abs() is not needed for this; pdist will already have calculated distance as a non-negative number.
Now
Mi = M(i,:);
x(i)=sum(Table.t1(Mi)<Table.t1(i) & Table.tend(Mi)>Table.tend(i));
However you should do timing tests against
Mi = M(i,:);
x(i)=sum(Mi & Table.t1<Table.t1(i) & Table.tend>Table.tend(i));
and
Mi = M(i,:);
Tt = Table(Mi);
x(i)=sum(Tt.t1<Table.t1(i) & Tt.tend>Table.tend(i));
2 comentarios
Walter Roberson
el 4 de Feb. de 2019
abs(T.t1 - T.t1.')
would work as a distance function for you in R2016b and later.
Ver también
Categorías
Más información sobre Matrix Indexing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!