Parfor HPC Cluster - How to Assign Objects to Same Core Consistently?

Question

Douglas Brantner el 25 de Jul. de 2025

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2178750-parfor-hpc-cluster-how-to-assign-objects-to-same-core-consistently

Editada: Douglas Brantner el 30 de Jul. de 2025

Hello,

TLDR: Is there a way to force Matlab to consistently assign a classdef object to the same core? With a parfor loop inside another loop?

Details:

I'm working on a fairly complex/large scale project which involves a large number of classdef objects & a 3D simulation. I'm running on an HPC cluster using the Slurm scheduler.

The 3D simulation has to run in a serial triple loop (at least for now; that's not the bottleneck).

The bottleneck is the array of objects, each of which stores its own state & calls ode15s once per iteration. These are all independent so I want to run this part in a parfor loop, and this step takes much longer than the triple loop right now.

I'm running on a small test chunk within the 3D space, with about 1200 independent objects. Ultimately this will need to scale about 100x to 150,000 objects, so I need to make this as efficient as possible.

It looks like Matlab is smartly assigning the same object to the same core for the first ~704 objects, but then after that it randomly toggles between 2 cores & a few others:

This shows ~20 loops (loop iterations going downwards), with ~1200 class objects on the X axis; the colors represent the core/task assignment on each iteration using this to create this matrix:

task = getCurrentTask();
coreID(ti, ci) = task.ID;

This plot was created assigning the objects in a parfor loop, but that didn't help:

The basic structure of the code is this:

% pseudocode:
n_objects = 1200;       % this needs to scale up to ~150,000 (so ~100x)
for i:n_objects
    object_array(i) = constructor();
    % also tried doing this as parfor but didn't help
end
% ... other setup code...
% Big Loop:
dt = 1; % seconds
n_timesteps = 10000;
for i = 1:n_timesteps   
    
    % unavoidable 3D triple loop update
    update3D(dt);
    
    parfor j = 1:n_objects
        
        % each object depends on 1 scalar from the 3D matrix
        object_array(i).update_ODEs(dt);  % each object calls ode15s independently
        
    end
    
    % update 3D matrix with 1 scalar from each ODE object
end

I've tried adding more RAM per core, but for some reason, it still seems to break after the 704th core, which is interesting.

And doing the object initialization/constructors inside a parfor loop made the initial core assignments less consistent (top row of plot).

Anyway, thank you for your help & please let me know if you have any ideas!

I'm also curious if there's a way to make the "Big Loop" the parfor loop, and make a "serial critical section" or something for the 3D part? Or some other hack like that?

Thank you!

ETA 7/28/25: Updated pseudocode with dt & scalar values passing between 3D simulation & ODE objects

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Douglas Brantner el 28 de Jul. de 2025

Editada: Douglas Brantner el 28 de Jul. de 2025

Thanks for your help - I'm not sure if this answers your question, but I need the external "big loop" because I'm taking 1 time step in the 3D simulation, then 1 time step in the ODE simulation, and I need to iterate over the two because they influence each other.

There are likely several thousand timesteps, so wouldn't creating & destroying the objects on each iteration add a lot of overhead?

I suppose the object might be overkill (there's a lot of support code, plotting, analysis, etc. that is not specifically needed for the simulation itself)... but I need the states vector for each ODE instance to be preserved over the "Big Loop" and updated on each step, for each object (which there will be ~150,000 and possibly even more at full scale).

I could make a matrix of state vectors & just call the ODE solver on each row/column of the matrix in parallel... but is there a way to force assignment to a specific core's local RAM/memory so the ODE solvers will run nicely in parallel without passing lots of data back & forth?

There's only 1 scalar number (per voxel/instance) that actually needs to be passed back & forth from the 3D simulation to the ODE instances.

Thanks!

PS - I updated the pseudocode a bit w/ the scalar interaction to clarify.

Douglas Brantner el 28 de Jul. de 2025

Abrir en MATLAB Online

I also tried breaking the parfor loop into blocks, where (for example) each block has size 32 if there are 32 workers.

This made the striping very nice in the object/core assignment graph, but it was *significantly* slower than one big parfor.

n_workers = 32
n_blocks = n_objects / n_workers
for %big loop over timesteps
    
    % 3D simulation (triple loop)
    
    for i = 1:n_blocks
        parfor j = 1:n_workers
            % loop over 1 block of objects at a time
            % to try to force each object to the same core each time
        end
    end
    
    
end

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Edric Ellis el 29 de Jul. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2178750-parfor-hpc-cluster-how-to-assign-objects-to-same-core-consistently#answer_1568550

Abrir en MATLAB Online

I think this might be a case for spmd. With spmd, you can ensure you construct the objects on particular workers, and only ever operate on them there. The following code assumes you can divide the number of objects evenly (if you can't, you'll need to do a bit more bookkeeping).

spmd
    % Construct objects direct on the workers
    n_per_worker = n_objects / spmdSize;
    for i = 1:n_per_worker
        object_array(i) = constructor();
    end
    
    % Big loop
    dt = 1; % seconds
    n_timesteps = 10000;
    for i = 1:n_timesteps
        update3D(dt); % Not sure what this needs to modify...
        for j = 1:n_per_worker
            object_array(j).update_ODEs(dt);
            % Extract the scalar from each object
            scalar_per_obj(j) = object_array(j).get_scalar();
        end
        % Get all the scalars across all workers
        all_scalars = spmdCat(scalar_per_obj);
        % Do something with all_scalars...
    end
end

In this sketch, each worker constructs a vector of objects, and then operates on them independently. The spmdCat is an example showing how all workers can get all the scalar values, which I'm assuming they need to proceed to the next timestep. If you wish, you could have that piece run on only one worker by doing something more like this:

% call spmdCat, with result only on worker 1
dim = 1; % concatenation dimension
destination = 1;
all_scalars = spmdCat(scalar_per_obj, dim, destination);
if spmdIndex == destination
    result = sum(all_scalars.^2);
    % send result to all workers
    spmdBroadcast(destination, result);
else
    % Get result from "destination"
    result = spmdBroadcast(destination);
end

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Douglas Brantner el 30 de Jul. de 2025

Editada: Douglas Brantner el 30 de Jul. de 2025

Thank you! I just started reading about spmd and I think you might be right.

I'm also looking at "codistributed" arrays which it seems lets you split the data among many cores & "pin" it there. I might wind up abandoning the class & just making a large codistributed matrix where each row or column is the states array for 1 ODE solver.

https://www.mathworks.com/help/parallel-computing/working-with-codistributed-arrays.html

Are there any memory optimizations that can be made for repeated calling of ode15s? Like 'persistent' variables for internal data inside the ODE function? (There are a lot of intermediate values & sub-equations within the ODE system, so any way to avoid re-allocating that on each call on each object would help...)

Iniciar sesión para comentar.

Parfor HPC Cluster - How to Assign Objects to Same Core Consistently?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Parfor HPC Cluster - How to Assign Objects to Same Core Consistently?

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos