Unable to submit task result (Matlab parallel server)

8 views (last 30 days)
Hi,
I am running some tests on a cluster. I create a job, and I submit several tasks. But, I get the following error
Error: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.).
Original cancel message:
java.lang.Exception: Unable to submit task result - MATLAB will now exit and restart.
Where shall I start to look at? What does practically this error mean? Is it a problem on the client side, or on the cluster side?

Answers (1)

Raymond Norris
Raymond Norris on 2 Dec 2021
Hi Maria,
A few questions first:
  • Which platform is MATLAB Parallel Server running on, Linux or Windows?
  • Which scheduler are you using (MJS, PBS, etc.)?
  • What size pool are you running?
  • How many cores per node?
  • How much RAM per node?
If you're running non-MJS, try the following. I'll show using both batch and parpool.
setenv('MDCE_DEBUG','true')
cluster = parcluster;
% If you're using batch
job = cluster.batch();
job.wait
cluster.getDebug(job)
% If you're using parpool
pctconfig('preservejobs',true);
pool = cluster.parpool();
cluster.getDebug(cluster.Jobs(end))
If you're using MJS
mjs = parcluster;
mjs.ClusterLogLevel = 4;
% Call either batch or parpool
mjs.getClusterLogs()
Perhaps the log file will display something else. If I had to guess, I'm betting you're running out of memory.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by