Trouble vectorising a loop

I'm trying to tidy up and improve some of my code and have come across a loop that I am struggling to vectorise. I am calculating the mean value of specific parts of a matrix and replacing certain values within the matrix with the newly calculated values.
For example, if I have a matrix 'A' I would like to replace element A(3,1) with the mean of A(2,1) and A(3,1). Likewise, I would then like to replace A(3,2) with the mean of A(2,2) and A(3,2) and so on (although I don't want to do this for every single column, just specific ones). This process is also repeated at various other locations in the matrix (i.e. also replace A(6,1) with the mean of A(5,1) and A(6,1) and so on).
The looped version of my code is of the following form:
A = rand(7,6);
firstRows=[2;5];
lastRows=[3;6];
columnIndicies=[1 2 3];
A2=A;
for j=1:1:length(lastRows)
A2(lastRows(j),columnIndicies)= ...
mean(A((firstRows(j)):(lastRows(j)),columnIndicies),1);
end
A3 = A2(lastRows,:);
Any comments / assistance would be much appreciated.

 Respuesta aceptada

Matt J
Matt J el 9 de Jul. de 2013
Editada: Matt J el 9 de Jul. de 2013

0 votos

J=arrayfun(@(a,b) (a:b), firstRows,lastRows,'uni',0);
I=cellfun(@(a,b)ones(1,length(a))*b ,J.',num2cell(1:length(J)),'uni',0);
S=cellfun(@(a,b)ones(1,length(a))/length(a) ,J,'uni',0);
B=sparse([I{:}],[J{:}],[S{:}],length(lastRows),size(A,1));
A2=A;
A2(lastRows,columnIndices)=B*A(:,columnIndices);

8 comentarios

Rhys
Rhys el 10 de Jul. de 2013
Thanks Matt, that's definitely an elegant way of doing it.
I had to remove the .' (from J.' on the second line) to get it to work but this answer is the more useful of the two because the number of rows between corresponding 'firstRows' and 'lastRows' elements is not always constant in my application.
However, if I tic toc around the functions I find that the unlooped version is actually a bit slower than the looped example. Is this normal?, I've always approached MatLab with the philosophy that loops are slow...
Jan
Jan el 10 de Jul. de 2013
The idea that loops are slow in general belongs to Matlab before version 6.5. In the last decade Matlabs JIT acceleration show good improvements in handling loops efficiently. But the old rumor of slow loops will never die.
While the processors are very fast today, the RAM is not. The darwback of creating large temporary arrays in a vectorized method can exceed the benefit of the vectorized processing. Then loops are faster.
Muthu Annamalai
Muthu Annamalai el 10 de Jul. de 2013
I hear good things about bsxfun for binary ops on matrices. Could your situation allow you to use that instead of arrayfun/cellfun ?
Matt J
Matt J el 10 de Jul. de 2013
Editada: Matt J el 10 de Jul. de 2013
However, if I tic toc around the functions I find that the unlooped version is actually a bit slower than the looped example. Is this normal?, I've always approached MatLab with the philosophy that loops are slow...
Rhys, As Jan says, whether you would outperform a for-loop depends on many factors. We would have to know your actual sizes of A, firstRows, lastRows, etc.... We would have to know how big
mean(lastRows-firstRows)
typically is. We would also have to know where you are putting your tic/toc.
One thing I would stress, for example, is that if firstRows and lastRows are fixed and you plan to be re-applying them to multiple A, then your tic and toc should go around just the last line
tic;
A2(lastRows,columnIndices)=B*A(:,columnIndices);
toc
because the preceding lines are just fixed pre-computations and constitute non-recurring computational costs.. That last line I would expect to perform much faster than re-applying your original for-loop for every new A.
Matt J
Matt J el 10 de Jul. de 2013
It also helps if you chop out the unused columns before you begin processsing. Below is a timing comparison that I did. Even with the pre-processing needed to set up the sparse matrix B, I see a significant speed-up.
N=5000;
A = rand(N);
firstRows=1:20:N-20;
lastRows=firstRows+10;
columnIndices = 1:10:N ;
A2_1=A; A2_2=A;
A=A(:,columnIndices);
A0=A;
tic
for j=1:1:length(lastRows)
A0(lastRows(j),:)= ...
mean(A((firstRows(j)):(lastRows(j)),:),1);
end
A2_1(:,columnIndices)=A0;
toc;
%Elapsed time is 0.076621 seconds.
tic;
J=arrayfun(@(a,b) (a:b), firstRows,lastRows,'uni',0);
I=cellfun(@(a,b)ones(1,length(a))*b ,J,num2cell(1:length(J)),'uni',0);
S=cellfun(@(a,b)ones(1,length(a))/length(a) ,J,'uni',0);
B=sparse([I{:}],[J{:}],[S{:}],length(lastRows),size(A,1));
A0(lastRows,:)=B*A;
A2_2(:,columnIndices)=A0;
toc;
%Elapsed time is 0.012123 seconds.
Rhys
Rhys el 10 de Jul. de 2013
Matt, you are correct in saying that the line
A2(lastRows,columnIndices)=B*A(:,columnIndices);
is fast, the array/cellfun calls are the slowest parts of the code. As for some real numbers;
size(A) ~ 1000 100
length(firstRows)=length(lastRows) ~ 100
mean(lastRows-firstRows) ~ 4
Either method doesn't take long to perform, but the method will be called 1000's of times. The tic/toc is currently around the whole piece of code. I get your point about the pre-computations but unfortunately the firstRows and lastRows values are not fixed. They may be the same in some cases but definitely not in all (this is something I'll have to look into further).
Matt J
Matt J el 10 de Jul. de 2013
Rhys, see my most recent version (and timing comparison) above. Chopping out the unused columns of A helped a lot. My version exceeded the for-loops signficantly, even including the cellfun calls.
Rhys
Rhys el 11 de Jul. de 2013
Thanks Matt, chopping the unused columns out of the calculation really does the trick!

Iniciar sesión para comentar.

Más respuestas (2)

Matt J
Matt J el 9 de Jul. de 2013
Editada: Matt J el 9 de Jul. de 2013

0 votos

idx=false(size(A));
idx1=idx; idx1(firstRows,columnIndices)=1;
idx2=idx; idx2(lastRows,columnIndices)=1;
A2=A;
A2(idx2)=(A(idx1)+A(idx2))/2;
Jan
Jan el 11 de Jul. de 2013
Editada: Jan el 11 de Jul. de 2013

0 votos

Another idea:
S = cumsum(A, 1);
B = S(lastRows, :) - S(firstRows - 1, :);
A0(lastRows, :) = bsxfun(@rdivide, B, lastRows - firstRows + 1);
Three problems: 1. I cannot test this currently, and I frequently confuse -1 and +1. 2. The exception firstRows==1 must be handled. 3. CUMSUM accumulates rounding errors and you have to check, if the accuracy of the results satisfies your needs.
Benefit: It avoid calculating the sum of neighboring elements multiple times in case of overlapping intervals.

Categorías

Más información sobre Loops and Conditional Statements en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 9 de Jul. de 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by