Why is parfor repeating loops

Dear Matlab community,
I am running 500 simulations, and am using parfor to run several simulations (one simulation on each core) in parallel to finish running the simulations more quickly. However it seems parfor is repeating some of the simulations, unnecessarily increasing the total run time.
Currently I have a folder for each of the 500 simulations containing the inputs. The code navigates to each folder. Then it loads the inputs, and runs the simulation for that set of inputs. Finally it saves the outputs for that set of inputs to a folder unique to that set of inputs (I should end up with 500 output folders). Within each parfor loop there are three additional inputs that may change depending on what I want to test, I select the input I want to use with a for loop. At the moment I am keeping these inputs constant (I only specify one possible option for each of these inputs). I can tell that the code is repeating some of these simulations as for a specific set of inputs I am sometimes getting multiple output files. Specifically (if you look at the pseudo code) it is as if it is testing different weightings, however I have only specified 1 weighting. As far as I can tell repeatition of the simulations is random, of the 92 simulations that completed/started to run 48 of the simulations were repeated and I cannot see any particular pattern in the simulations that have been repeated.
I have written a pseudo code below to demostrate what I am doing.
model = {"model1"; "model2"; "model3", ...};
loadingConditions = "quasiStatic";
costFunction = "sumWeightedAbsoluteErrorSquared";
weights = weight1;
% Do things to check the models, loading conditions, cost functions,
% weights are correctly specified and specify simulation inputs that are
% constant.
% In this I use for loops which use gg, hh, ii, and jj as the index.
numModels = max(size(model));
numLoadingConditions = max(size(loadingConditions));
numCostFunctions = size(costFunction,2);
numWeights = size(weights,1);
% Clear indices used in for loops outside of parfor to clear temporary
% variables
clear hh ii jj
parfor gg = 1:numModels
% Specify inputs for simulations that are constant in this set but I
% might want to change in the future
% Load simulation inputs
for hh = 1:numLoadingConditions
% Import loading condition
for ii = 1:numCostFunctions
% Select cost function to use
for jj = 1:numWeights
% Specify the output folder, based on the input folder, the
% loading condition, cost function, and the weighting
% Select the loading condition, the cost function to use,
% and the weighting to use
% Run simulation
% Save outputs
end
end
end
end
Could this be due to a new parfor loop on one core starting before the previous loop had completed? I did not think this was possible. However as the code failed before completing all the iterations I was able to see the number of iterations currently ongoing when the code failed as each time a loop completed it had to write an output. Of the simulations run 10 had not written the output, suggesting 10 loops were being run at the time the code failed. However the computer has 8 cores, which to me suggests some of the cores were running multiple simulations at the same time.
If anyone has any suggestions as to why and as to how to stop this it would be very helpful.
Thanks,
Samuele

6 comentarios

Benjamin Thompson
Benjamin Thompson el 5 de Oct. de 2022
Does it work in serial manner by calling
parfor (i = 1:100, 0)
You might also look at the documentation of parforOptions to see if you need to include any files or path dependencies there.
Samuele Gould
Samuele Gould el 5 de Oct. de 2022
Thanks for the suggestions. I will test running it in a serial manner and see what comes out.
I don't understand why it would require the parforOptions to pass the model files to parfor, if anyone can explain that would be great. I will to pass the models to the parfor loop with the 'AttachedFiles' option to see if that stops it loading the same models twice.
My (limited) understanding of parfor is that if I have an index specifed as gg=1:100, with 8 pools is that it will select 8 values between 1 and 100 (not sequentially) run those 8 loops. This is repeated until all integers between 1 and 100 have been selected. It shouldn't select values that have been previously selected. Each model has a unique integer value associated with it, which corrisponds to the values of the index; the model to load is specified by the value of the index. If the values selected by parfor are not repeated, I would have thought that the models to load (and therefore simulation to run) should also not be repeated?
It might be a few days before I know if this solves the problem, but will post here when I know.
Thanks
Walter Roberson
Walter Roberson el 5 de Oct. de 2022
Check the configuration of your local pool. People sometimes edit the configuration to change the pool worker count from "physical cores" to "logical cores" hoping that the hyperthreading gets them extra performance (typically hyperthreading MATLAB results in lower performance, but if the workers are doing notable I/O then it is possible for performance to improve with hyperthreading.)
Walter Roberson
Walter Roberson el 5 de Oct. de 2022
I would suggest adding a small bit of logging:
  • time + date
  • https://www.mathworks.com/help/parallel-computing/parallel.task.html and get the ID property (this identifies the worker)
  • gg (parfor variable) (at the very least), possibly hh and ii as well
  • log "task completed" with date/time, for ID + gg combination
  • consider wrapping the work in a try/catch just in case a fatal error is generated: you would be able to log the error record
Samuele Gould
Samuele Gould el 6 de Oct. de 2022
Thank you for these sugestions as well, I will implement them next time I am editing the code
Edric Ellis
Edric Ellis el 7 de Oct. de 2022
There are certain conditions under which a parfor loop can repeat iterations, but these ought to be rather rare. They are to do with two specific forms of error handling:
  • Dealing with missing files that need to be auto-attached
  • Handling crashed workers
While running your code, do you see any messages in the MATLAB command window related to analysing and attaching files? If so, you can pre-empty this by using addAttachedFiles to send the code to the workers ahead of time. If the workers are crashing... well, that might be harder to avoid.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Productos

Versión

R2022b

Etiquetas

Preguntada:

el 5 de Oct. de 2022

Comentada:

el 7 de Oct. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by