About parallel computation and inter process communication

1 visualización (últimos 30 días)
Ash
Ash el 12 de Jul. de 2014
Comentada: Ash el 15 de Jul. de 2014
Hello all!
There is a piece of code that deals with finding patterns in sequences of strings of varying length. Nothing overly complex - except that the main code includes three loops. Anyway - the basic premise is as follows:
  1. Load the entire data set (essentially as a cell array) consisting of rows of these sequences.
  2. Run the main code
  3. Write the output to a file.
Sequentially this process when running without any parallel directives takes "x" seconds.
Now: if I change this to:
  1. Load the entire data set
  2. Start matlabpool
  3. invoke spmd(n)
  4. Run the main code.
  5. Write the output to file.
The run time is approximately "10x"!!
The machine on which this is being run: 12GB RAM, i7 with 6cores etc. etc.
From my understanding, upon invoking spmd (since I just am interested in letting different workers perform the same job on different sets of data), the total data set is automatically divided. So - logically the run time should decrease.
However, while trying to figure this out: I also divided the data set into process specific files which are loaded based on respective "labindex". That also - did not provide any relief nor answers.
I have some background with MPI and F90 so I am assuming that the significantly increased run time with more than one worker is probably due to inter-process communication. If that is so: is there any way to prevent this?
The problem I am trying to solve is a disjointed one. One set of data has no bearing on the other - so there is no real need for one worker to talk to another.
Any insight would be greatly appreciated. This really has me intrigued.
Cheers!

Respuestas (1)

Edric Ellis
Edric Ellis el 14 de Jul. de 2014
What sort of data are you passing into SPMD? Inside SPMD, only distributed arrays are automatically operated on in parallel. For example:
x = rand(5000);
xd = distributed.rand(5000);
spmd
x = x * x; % all workers operate on their own total copy of 'x'
xd = xd * xd; % each worker has a slice of 'xd', and they collaborate
end
  3 comentarios
Edric Ellis
Edric Ellis el 15 de Jul. de 2014
Editada: Edric Ellis el 15 de Jul. de 2014
Unless you need the (MPI-style) communication available within SPMD, you might be better off using PARFOR which can automatically divide up your problem. For example:
% build 'c' which is a 50x1 cell array where each cell is 100x100
c = mat2cell(rand(5000, 100), 100 * ones(50,1), 100);
% operate on 'c' in parallel
parfor idx = 1:numel(c)
out{idx} = max(abs(eig(c{idx})));
end
The key to getting PARFOR working in this case is that you index into your cell array ("c" in the above example) using the loop variable - this ensures the data is 'sliced', and therefore can be operated on efficiently in parallel.
Ash
Ash el 15 de Jul. de 2014
I had looked at parfor earlier. However, let me make some changes to the code, and get back with my findings. I really appreciate your inputs. Thanks...

Iniciar sesión para comentar.

Categorías

Más información sobre MATLAB Parallel Server en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by