How to set up a Matlab parallel cluster for thread-based environment

9 visualizaciones (últimos 30 días)
Hi,
I am starting to explore Matlab Parallel functionalities, and, I have to say, I am a bit confused about the process-based vs. thread-based environment.
First question: I have 2 clusters, namely, the local cluster and the "MatlabCluster" (remote cluster with 8 nodes, 32 workers). If I use
poop = parpool('MatlabCluster');
the default environment is the "process-based" environment. Correct? Can I use the remote cluster in a "thread-based" environment? If I do
pool = parpool('thread');
only the local cluster switches to 'thread'. Can I do the same with the remote cluster?
Second question: I am experimenting with distributed arrays. However, if I start the 'MatlabCluster' (remote cluster), I get few errors and the last error message is
No workers are available for FevalQueue execution
This happens for the line of code that uses distrubuted arrays. I read that FevalQueue is not supported in "thread-based environment". Does this error mean that, by default, the remote cluster is starting as "thread-based"? (which would contradict my first hypotesis?).

Respuesta aceptada

Raymond Norris
Raymond Norris el 16 de Jun. de 2021
The thread-based pool only runs on the same machine as the MATLAB client, similar to a local process-based pool. However, unlike the local pool, the threaded pool has a fixed startup size, which is the value returned by maxNumCompThreads. If you wanted a different number of workers started with a threads pool, you have to set it first in maxNumCompThreads. For example:
% Let's assume you have 8 physical cores, but only want to start a threaded
% pool of 2 workers.
old_threads = maxNumCompThreads(2);
parpool("threads");
Starting parallel pool (parpool) ... Connected to the parallel pool (number of workers: 2).
ans =
ThreadPool with properties: NumWorkers: 2
Keep in mind that setting maxNumCompThreads, in addition to effecting the number of workers started, may have an effect on your other MATLAB code.
You'll need to post a bit more (code, errors) to decipher the FevalQueue error.
  2 comentarios
Maria
Maria el 16 de Jun. de 2021
Editada: Maria el 16 de Jun. de 2021
Thank you for your answer. With respect to the FevalQueue error, I am running some more test, and I start thinking that there is some problem with the distributed memory of nodes. I have some issues with the cluster that we set up, and I am already in contact with the Mathworks support since a week or so. However, I am able to run the remote cluster to some extent. I tried some code with a couple of parfor and I could see that all 32 workers were working.
Now, I tried to run a very simple test:
A = magic(4);
B = distributed(A);
And I get the warning:
Warning: The SPMD infrastructure has been initializing for 94 seconds. This may indicate a problem in initialization.
You might need to restart the pool.
And then
Error using distributed (line 282)
One or more futures resulted in and error.
Caused by:
No workers are available for FevalQueue execution.
The cluster has 8 nodes, 32 workers, that run Debian 10.9 (Buster). The client machine is also linux-based. The job scheduler is mjs. The firewall is disabled on all nodes, we already run tests including disabling the firewall on the client machine, and we excluded it as a problem.
During validation, the parpool "hangs" and I have to manually terminate Matlab because it does not respond anymore. This happens only when we use more than 1 node in the cluster.
How do I check the memory set up among the nodes of the cluster?
Raymond Norris
Raymond Norris el 16 de Jun. de 2021
To get the memory on the Linux nodes, run
free -mth
This will give you the free & used memory.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre MATLAB Parallel Server en Help Center y File Exchange.

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by