execution time with or without parfor
Mostrar comentarios más antiguos
I have a simple code for testing parfor in my local profile (with 4 cores)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 1
matlabpool open 4 % 2 or 1
tic;
parfor i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
matlabpool close
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 2
tic;
for i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I have executed code 1 using 4 labs or 2 labs or 1 lab and executed code 2. the results is here:
code-1 - 8 labs(4 core with 4 hypthread) --> 15 sec
code-1 - 4 labs --> 22 sec
code-1 - 2 labs --> 35 sec
code-1 - 1 labs --> 65 sec
code-2 - --> 18 sec
regards the results, it is better to use code-2 and releasing all other cores (you may also consider the time needed to run 'matlabpool open' and 'matlabpool close'). I have read this : http://www.mathworks.co.uk/matlabcentral/answers/44734-there-is-aproblem-in-parfor
but it seems in this case execution time is much longer than setup time of parallel mechanism.
if there is not any thing wrong with my results, main question is when its better to use parfor.
17 comentarios
Matt J
el 3 de Feb. de 2014
I can't reproduce that, I'm afraid. I see close to linear speed-up with 2,4, and 12 workers in the pool. What version of MATLAB are you using and what CPU(s)?
Edric Ellis
el 4 de Feb. de 2014
NUMLABS is designed to return 1 inside PARFOR because you cannot use labSend/labReceive there. This is described in the documentation.
amir
el 4 de Feb. de 2014
Matt J
el 4 de Feb. de 2014
NUMLABS will only return a meaningful value inside an SPMD...END block.
Matt J
el 4 de Feb. de 2014
@mohammad
Are there any other machines available to you that you could test it on, to check whether the problem is platform-dependent?
amir
el 4 de Feb. de 2014
amir
el 5 de Feb. de 2014
As I mentioned here, I ran the first version of the code and successfully achieved near linear speed-up with PARFOR. That was with R2013b. I haven't run the second version of the code yet, but I don't see any significant modification in it that would lead me to expect a different result.
So, the slow behavior you're seeing has to be environment-related.
Here are my results when I run the modified version of the test code for poolsize=0:12. The three columns correspond to R2011b, R2012b, and R2013b
Times =
19.9430 20.4689 21.0302
21.1632 21.8318 23.0208
10.6021 10.7968 11.5326
7.0738 7.3209 7.9293
5.7969 5.9354 6.1944
4.3994 4.5522 4.9174
3.7105 3.8611 4.1811
3.6653 3.7533 3.9924
3.0179 3.1299 3.2726
2.9612 3.0899 3.2563
2.3155 2.3643 2.5791
2.3111 2.3792 2.5677
2.3000 2.3633 2.6129
Interestingly, performance gets a bit slower with more recent releases. Not sure if that's a significant trend, though. This is on an Intel Xeon X5680 @3.33 Ghz, dual hexacore CPU.
So... still baffled.
Matt J
el 6 de Feb. de 2014
Any difference if you pre-allocate A first?
Matt J
el 6 de Feb. de 2014
I wish one system like that.do you fly with it ?
Not always. Like you, I've also had cases where PARFOR mysteriously under-performs in environment-dependent ways. See this thread, for instance
Matt J
el 6 de Feb. de 2014
You're not doing any of this over a network are you? This is all on a local CPU?
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Parallel for-Loops (parfor) en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!