MATLAB Answers

Parfor solving optimization problems (Cplex) slower than for

2 views (last 30 days)
Katarzyna Furmanska
Katarzyna Furmanska on 17 Mar 2020
I am trying to solve a bunch of optimization problems in parallel using Matlab Parallel Toolbox 2018b on my client (Win10) + Matlab Distributed Server 2018b on my 3 node-cluster (Win7) with 52 workers. These are rather small problems, but there's hundreds of them so, theoretically, parfor should be helpful in this case.
I am reading these problems from .lp files into cell array and then I am solving them within parfor loop, as below:
% subp_array is 1xn cell array % with Cplex problems
nThreads = 1; % I don't see any time benefit of giving it more than 1 thread
totalTime = tic;
parfor subp_index = 1:length(subp_array)
iterTime = tic;
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
prob.Param.parallel.Cur = -1; % set parallel option to opportunistic
prob.Param.threads.Cur = nThreads; % set number of threads per problem
prob.Param.mip.tolerances.mipgap.Cur = 0.01;
% get time of particular iteration
elapsedTime{subp_index} = toc(iterTime);
% get time of entire loop
elapsedTotalTime = toc(totalTime);
The problem is that this parfor loop with 10 problems on 16 workers runs for 32 sec comparing go 1.5 sec (sic!) of regular for loop. When examinating time results, it comes out that elapsed time of particular iterations are very short, but overall loop time is still large...
These are values of elapsedTime array:
{[0.0275]} {[0.0317]} {[0.0274]} {[0.0314]} {[0.0695]} {[0.4816]} {[0.0808]} {[0.0343]} {[0.0399]} {[0.0845]}
which is in total less than 1 second!
Is there anything in the syntax that may cause time delays? I am using sliced variables, assigning prob firstly not to call the variable multiple times, no idea what else can be done... Apparently, if I run parfor with M = 0 (sequential), it gets the result immediately (in particular the difference is visible for few hundreds of problems). What may cause my parallel computing so slow?
Thanks in advance


Sign in to comment.

Answers (1)

Edric Ellis
Edric Ellis on 17 Mar 2020
Does the performance improve much / not much / not at all if you run the parfor loop a second time without closing the pool?
If the performance does improve a lot, then it's likely that the slow-down was caused by the parfor infrastructure having to work out that the code wasn't available, and attaching it to the pool. A message is printed when this occurs, or you can check the result of calling listAutoAttachedFiles:
You can either live with that first-time slow-down, or attach the files up-front using addAttachedFiles
If the performance remains the same, perhaps the problem is the amount of data being transferred. Use ticBytes and tocBytes to investigate this. You could also experiment with stubbing-out most of the loop body. I.e. if you run a loop like this:
parfor subp_index = 1:length(subp_array)
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
how does that perform? That loop incurs the same amount of data transfer.


Show 8 older comments
Katarzyna Furmanska
Katarzyna Furmanska on 25 Mar 2020
Another observation is that this distcompdeserialize time gets decreased when running the parfor loop the second time on the same (alive) pool of workers. But the overall time of calculations remains the same...
Edric Ellis
Edric Ellis on 25 Mar 2020
There's definitely a chunk of time in distcompdeserialize - that's an internal PCT function that is used when transferring data from the client process to a MATLAB worker process.
However, looking at the absolute times - there's still a big chunk of time going somewhere. The total time taken by remoteParallelFunction (which is the worker-side wrapper for the body of a parfor loop) is only ~0.6 seconds, but (if I've understood correctly) the overall loop takes much longer. I don't really have any good way to explain that.
I would go back to trying to run a version of the parfor loop with the data transfer in place, but the actual computations stubbed out. My suspicion is that that will still take basically the same amount of time. This points to data transfer being the bottleneck - despite the actual number of bytes being transferred being relatively not that large...
Katarzyna Furmanska
Katarzyna Furmanska on 25 Mar 2020
Yup, the results of parfor loop with only data transfer yields similar time, also dedicated mostly to distcompdeserialize.
The entire parfor loop takes around 100s, and either I have no idea where this time comes from... When being executed, each Cplex problem prints a log and you can see how slowly new logs are appearing. It just looks like workers would wait for the next problem to catch. And when running large number of problems on all available cores, I noticed that task manager does not show any heavy calculations - CPU usage is just a few percent, the small windows indicating cores do not show any jumping green lines. Could it be on the machine site that it does not allow for solving the problems in parallel, but rather puts in in sequential queue?
Thanks a milion again for giving me so many useful tips!

Sign in to comment.

Translated by