How to use parallel.pool.Constant to save on overheard for large array constant across all calls to parfeval?

8 visualizaciones (últimos 30 días)
I want to use parallel.pool.Constant to cut down on overhead when calling parfeval in a for loop. However, it seems that it makes no difference. To show this, I use an example that is similar to the example here, but I don't slice the data. Similar results hold if the data is sliced.
N = 80; % number of iterations to run
ppool = parpool("Processes"); % mine defaults to 8 workers
data = rand(100); % data to use at every iteration
C = parallel.pool.Constant(data);
ticBytes(ppool); % only count transfer during for loop
F(1:N) = parallel.FevalFuture;
for i = 1:N
F(i) = parfeval(ppool,@(x) sum(x,"all"),1,C.Value); % just do something with the data
end
A = fetchOutputs(F); % collect it just because
tocBytes(ppool);
This is the result from tocBytes:
Now, if I do the same but pass in data instead of the parallel constant I get the following:
N = 80; % number of iterations to run
if isempty(gcp('nocreate'))
ppool = parpool("Processes"); % mine defaults to 8 workers
end
data = rand(100); % data to use at every iteration
ticBytes(ppool); % only count transfer during for loop
F(1:N) = parallel.FevalFuture;
for i = 1:N
F(i) = parfeval(ppool,@(x) sum(x,"all"),1,data); % just do something with the data
end
A = fetchOutputs(F); % collect it just because
tocBytes(ppool);
Result from tocBytes:
The total bytes sent to the workers are identical, but I was expecting a ~10-fold change since each worker should be called about 10 times. Am I just missing the purpose of parallel.pool.Constant? Is there some other tool I should use to reduce this overhead?
  2 comentarios
Daniel Bergman
Daniel Bergman el 26 de Mayo de 2023
The obvious solution I have since realized is to “batch” all these iterations together and that works in my particular use case. Not sure if/how to use a call to batch to do this, but that’s for another time. I am still wondering what the point of making a parallel.pool.Constant is if it does not reduce overhead in the above case.
Walter Roberson
Walter Roberson el 26 de Mayo de 2023
Suppose you are using parfor instead of parfeval(), and you have
data = rand(100);
parfor i = 1:N
some calculation involving data
some other calculation
third calculation involving data
end
then does data get sent to all of the workers once at initialation time, or does data get sent to each worker the first time it needs data but never again for the same worker? Or does data get sent each iteration? Or does it get sent multiple times per iteration?
Using parpool.Constant brings some certainty into this: the data value is transmitted to each worker the first time the worker needs data and then gets held in memory on the worker.
Now imagine that the code appears to modify data. parfor can see the constant-ness and declare it is inconsistent to modify the constant.
Now imagine that the code uses clear . To be honest, I do not know what will happen in that case.

Iniciar sesión para comentar.

Respuesta aceptada

Edric Ellis
Edric Ellis el 30 de Mayo de 2023
You've got a subtle flaw in your first piece of code that allows the program to run, but it does not actually benefit from the use of parallel.pool.Constant. Your first snippet says this:
F(i) = parfeval(ppool,@(x) sum(x,"all"),1,C.Value);
Note that this extracts the .Value field from C at the client. In other words, you are not transferring the parallel.pool.Constant for execution. To benefit from the Constant, instead do this:
F(i) = parfeval(ppool,@(x) sum(x.Value,"all"),1,C);
This way, the input to the parfeval function is C itself, and the .Value is extracted on the workers.
  1 comentario
Daniel Bergman
Daniel Bergman el 30 de Mayo de 2023
Thank you! That did help as now there's only 1.1e+06 bytes sent to the workers. Not quite 10x in the theoretical optimum, but better!

Iniciar sesión para comentar.

Más respuestas (1)

Walter Roberson
Walter Roberson el 26 de Mayo de 2023
Version history
R2023a: Constant objects no longer automatically transferred to workers
MATLAB will no longer automatically transfer Constant objects from your current MATLAB session to workers in a parallel pool. MATLAB will send the Constant object to workers only if the object is required to execute your code.
===
To me that implies that each client needs to actively "pull" the value the first time it needs it. Since different workers would need it at different times, that implies to me that whatever happened historically, the entire contents of the Constant are now being copied to the workers individually . Hypothetically, in the past, there might have been some kind of "broadcast" mode that was able to send the information to all of the workers at the same time without duplicating it for each worker.
The current documentation wording does not completely rule out the possibility that the constants might be deposited into shared memory, but I doubt that is happening.
  4 comentarios
Daniel Bergman
Daniel Bergman el 30 de Mayo de 2023
@Edric Ellis In the parfeval example here, is there any benefit to using parallel.pool.Constant? Based on your answer above, it sounds like the client slices c.Value and transfers that column, so the constant c is not being transferred to the backgroundPool or its workers.

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.

Productos


Versión

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by