How the labindex was assigned for the workers inside a node/machine in MDCS?
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
raym
el 25 de Mayo de 2018
We know that in MDCS we can choose to create more than one workers inside a node/machine, say 4 workers per node/machine. So how the labindex was assigned for these 4 workers?Are thay always 1,2,3,4 for each node, or they are continuous increment node by node, such as 5-8, 9-12..., or they are totally random such as 1,3,9,6 for a node/,machine?
0 comentarios
Respuesta aceptada
Edric Ellis
el 25 de Mayo de 2018
You don't specify which cluster type you're using with MDCS, but I'm going to assume MJS for now. (Not all of what follows will be scheduler-specific).
labindex within an spmd context is equal to the task index executing on the worker. So, if you have 2 nodes each running 4 workers, and you run a single communicating job of size 8 (i.e. parpool('myMjsCluster', 8)), then the task indices are 1:8, as are the corresponding values of labindex.
MJS will endeavour to schedule things such that consecutive tasks are co-located on a single node - i.e. it will attempt to put tasks 1:4 on the first node, and 5:8 on the second. (Most other scheduler types will end up doing something similar, but by a different means).
Basically, what you need to do is come up with a mapping of labindex to hostname to work out which labs are located on which host, and then you can use that "local labindex" to pick which Java program to use. Here's one way.
spmd
[s, hostname] = system('hostname');
assert(s == 0, 'Failed to compute hostname');
hostname = strtrim(hostname);
% Get a list of all hostnames in the pool
allHostnames = gcat({hostname}, 1);
% Work out which labindex values are on this host
allLabs = 1:numlabs;
labsOnThisHost = allLabs(strcmp(hostname, allHostnames))
% Work out this lab's position among the labs on this host
myIndexOnThisHost = find(labindex == labsOnThisHost)
end
Más respuestas (1)
Walter Roberson
el 25 de Mayo de 2018
"The value of labindex spans from 1 to n, where n is the number of workers running the current job, defined by numlabs"
"This was done by pause a random seconds and then detect if there is ###.exe running in the tasklist of this node."
I would probably think in terms of having
if labindex == 1
check in case somehow external software is running
otherwise
launch external software
do any waiting for external software to be ready to go
end
end
labbarrier();
Ver también
Categorías
Más información sobre Parallel and Cloud en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!