distributed arrays slow with batch jobs

1 visualización (últimos 30 días)
Maria
Maria el 28 de Oct. de 2021
Comentada: Maria el 1 de Nov. de 2021
Hi,
I am working with distributed arrays.
As far as I understood, I can create distributed arrays directly on a cluster. When I want to manipulate what is inside the distributed array, I need to use spmd.
I wanted to avoid any interactive pool. For this reason, I created a function that uses a distributed array, and send it to the cluster as a batch job. The function looks like
function R = my_distributed_function(input)
R = eye(N,'distributed' );
for k = 1 : N
for m = 1 :N
R(k,m) = 1 *m;
end
end
And I send this to the cluster as a batch job
job_distributed = batch(c,@my_distributed_function,1,{myinput},'Pool',N-1,'CurrentFolder','.','AutoAddClientPath',false);
However, it takes very long, around 64 seconds. The function without the "distributed" takes around 2 ms.
If I do not use the batch job, but keep the "distributed" option, the interactive pool starts. Then of course, it takes around 2 seconds, but there is the time to start the parallel pool.
My question is : why the batch job takes so long if I use a function that uses distributed arrays?

Respuesta aceptada

Thomas Falch
Thomas Falch el 29 de Oct. de 2021
A batch job with the 'Pool' option ( a "batch-pool job") will end up starting the equivalent of a interactive pool, but using one of the workers as a substitute for the MATLAB desktop client. The overall time for such a job will therefore be pool startup + the acutall work you're doing. In other words, it will take about the same amount of time as an interactive pool.
The main benefit of an batch-pool job is that you can submit the job to the cluster, and then shut down the MATLAB desktop client (and indeed the computer it's running on). Meanwhile, the job is running on the cluster, and you can come back much later to get the results. This is useful for long running jobs which don't require any user input (which is what interactive pools are for).
  3 comentarios
Thomas Falch
Thomas Falch el 1 de Nov. de 2021
This happens whenever you use the 'Pool' option to batch() (or equivalently using createCommunicatingJob()).
If you use parfor with batch() without the 'Pool' option (or equivalently using createJob/createTask), it will probably not work as you expect. It will not cause any kind of pool to be opened, and it will basically run as a regular for loop (on a single worker of your cluster).
This is the same behavior you would get if you try to run a parfor loop in the MATLAB desktop client without a interactive pool open and you have disabled the option to start up a parpool when you encounter a parfor (or don't have the Parallel Computing Toolbox installed).
Maria
Maria el 1 de Nov. de 2021
Thank you for the clarification. I had completely misunderstood this point. I have some createTasks with some parfor, and I thought that the parfor was going to be executed as parfor...But now that I think well, of course, because createTasks creates the task at worker level, and it is 1 worker per core.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by