SLURM Distributed Computing error

5 visualizaciones (últimos 30 días)
Sanjay
Sanjay el 7 de Dic. de 2013
Respondida: Raymond Norris el 13 de Dic. de 2013
Hi,
I have using a matlab script to distribute tasks through SLURM. However, I keep on getting the following error in the script(SubmitQueuedJobs.m) I wrote for SGE earlier but I am adapting for SLURM now. I don't understand what is happening. Could you please help me.
"Error in SubmitQueuedJobs (line 76) job{j} = createJob(j); % createSgeJob(0);"
Here's the code
if true
function JobOut = SubmitQueuedJobs(nJobs,JobFcnHandle,nJobArgOut,JobOptions,...
ListPathDependencies,nMaxWorkers,ShowOutput,LogFileName,OutputPreamble)
% SubmitQueuedJobs
%
% Submit parallel jobs that might exceed the maximum number of workers
% available.
%
%
%
% Mandatory Inputs:
%
% nJobs
% Number of Jobs to run
%
% JobFcnHandle
% Handle to the function called by each worker.
%
% nJobArgOut
% Number of output arguments in each job.
%
% JobOptions
% Cell array with options required by the function called by the worker.
% It needs to contain one element per job, unless no inputs are
% necessary, in which case, {} needs to be used.
%
% ListPathDependencies
% Cell array with the list of path dependencies needed in function called
% by workers.
%
% nMaxWorkers
% [Optional] Maximum number of simultaneous workers. Default: 4.
%
% ShowOutput
% If set to 1 all output of each worker is shown. If LogFileName is not
% specified, then it is shown in the caller command history. If
% LogFileName is specified then it is stored in log.
% Default: 1
%
% LogFileName
% If specified, then all output is saved in a ascii file.
%
% OutputPreamble
% If specified this should be a string to introduce in a fprintf command
% before showing the output of each worker. It must receive as input the
% worker number.
%
if ~exist('nMaxWorkers','var'), nMaxWorkers = 4; end
if ~exist('ShowOutput','var'), ShowOutput = 1; end
if ~exist('LogFileName','var')||isempty(LogFileName),Save2Log=0;else Save2Log=1;end
if isempty(JobOptions),for j=1:nJobs,JobOptions{j} = {};end,end
nCompleted = 0;
nSubmitted = 0;
j = 0;
jobsRunning = false(nJobs,1);
JobOut = cell(nJobs,1);
while nCompleted<nJobs
while (nSubmitted<nMaxWorkers) && (j<nJobs)
% submit new jobs
j = j+1;
nSubmitted = nSubmitted+1;
job{j} = createJob(0); %createSgeJob(0)
set(job{j},'PathDependencies',ListPathDependencies)
TaskID{j} = createTask(job{j},JobFcnHandle,nJobArgOut,JobOptions{j});
if ShowOutput
set(TaskID{j}, 'CaptureCommandWindowOutput', true);
end
submit(job{j})
jobsRunning(j) = true;
end
pause(0.01)
listJobsRunning = find(jobsRunning);
for jj=1:length(listJobsRunning)
if strcmp(get(job{listJobsRunning(jj)},'State'),'finished')
jobsRunning(listJobsRunning(jj)) = false;
nSubmitted = nSubmitted-1;
nCompleted = nCompleted+1;
end
end
end
for j=1:nJobs
if ShowOutput
if Save2Log
jobLogName = sprintf('%s%.0f.log',LogFileName,j);
fid = fopen(jobLogName,'wt');
if exist('OutputPreamble','var')
fprintf(fid,OutputPreamble,j);
end
fprintf(fid,strrep(get(TaskID{j},'CommandWindowOutput'),'%','%%'));
fclose(fid);
else
if exist('OutputPreamble','var')
fprintf(OutputPreamble,j);
end
fprintf(strrep(get(TaskID{j},'CommandWindowOutput'),'%','%%'));
end
end
if ~isempty(get(TaskID{j},'errormessage'))
fprintf(strrep(get(TaskID{j},'erroridentifier'),'%','%%'));fprintf('\n');
fprintf(strrep(get(TaskID{j},'errormessage'),'%','%%'));fprintf('\n');
end
JobOut{j} = getAllOutputArguments(job{j});
destroy(job{j})
end
JobOut = cat(1,JobOut{:});
end

Respuestas (1)

Raymond Norris
Raymond Norris el 13 de Dic. de 2013
If it helps, there are SLURM integration scripts on File Exchange
http://www.mathworks.com/matlabcentral/fileexchange/29910-slurm-integration-scripts

Categorías

Más información sobre Cluster Configuration en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by