Parallel Processing Using Parsim Stalls on 'cleaning up parallel workers' after Successful Runs

21 visualizaciones (últimos 30 días)
Hello, I have a 48 core machine with 384Gb of RAM. I am using parsim to run my simulations in parallel on my local machine. I have 26 simulations to run in parallel. I have found I run out of memory and matlab throws errors as a result. I decide to run 3 jobs in parallel, the CPU utilization is obviously, and unfortunately, not high. However, in this case I do not run out of memory and I am using only about 120Gb. My 3 jobs took 1 hour to simulate Simulink. "Cleaning up parallel Workers" dialog box is shown upon successful completion of my runs. It has stayed active for 17 hours. If this step took 20 mins I would have already lost a lot of the benefits of parallel computing since the total time in parallel computation is approaching the serial case. I had to terminate the Matlab processes to regain control of the computer. My Simulink runs are 1000 seconds long. I have successfully obtained results running them for 1 sec instead as a test, but this is not adequate for my purposes. I am transferring about 200Mb of workspace variables into Simulink.SimulationInput objects for each run, which is only a small fraction of my total available memory. This is not the expected behavior. (1) I would expect Matlab to use a swap file instead of failing due to running out of RAM, (2) More importantly, the 'cleaning up parallel workers' process clearly crashed with no indication of a crash. Please advise me on how to succeed with paralleled processing and overcoming this stated issues. I am running Matlab 2024a.

Respuestas (1)

Divyam
Divyam el 11 de Jul. de 2024
Hi Dave, this issue can arise due to the following reasons:
  • High logging data volume which happens due to the creation of the "Simulink SimulationOutput" object which is stored in the machine's RAM.
  • Large DMR files are being generated in the machine's "temp" folder. (If you are logging signal data)
This list is not exhaustive however it must be noted that as the number of simulations increases, memory consumption rises too, which may also be causing this issue.
The solutions to these problems are:
1. Reducing Logged Data Volume
Since multiple simulations are being run in this case, the logged simulation data will accumulate in your Simulation Data Inspector (which persists even after overwriting logged data in the MATLAB workspace).
By reducing the amount of runs or limiting the data points you can set a limit after which the “Simulink Data Inspector” will start deleting the archived runs on a FIFO basis.
Setting a maximum size for logged data can also help to resolve the “out of memory” issue. Alternatively, if you want the logging data to not be recorded you can set the “Record mode” to “false”.
The postSimFcn can be used to manage data post-processing by discarding unnecessary data or by saving it externally or in the MATLAB file store.
To decrease the load on RAM, the “Log Dataset data to file” configuration can be selected by navigating to “Data Import/Export > Log Dataset data to file, this will decrease the size of the SimulationOutput” object as now the object will contain a “DatasetRef” instead of actual data. This creates a reference to the “MAT” file location and reduces memory usage.
You can refer to this MATLAB answer for more information on how to use the "postSimFcn":
2. Reducing DMR File Size
If signal data is being logged, DMR files will be generated in the "temp" directory where each worker will log data in its own DMR file. The maximum size of the DMR files can be estimated using the formula:
MaxSize = Number_of_Workers * Size_of_Simulation_Output
This “MaxSize" should not exceed the memory of the machine and when it does it may result in destabilization or errors. If you believe that the size of the DMR files is causing this issue you can refer to this answer for further information on how to reduce it: https://www.mathworks.com/matlabcentral/answers/1973754-why-are-there-very-large-dmr-files-filling-up-my-hard-drive-temp-directory-when-running-parsim-or#answer_1244859
If you face any issues with the postSimFcn after using the solutions above, you can refer to this MATLAB answer and documentation:

Categorías

Más información sobre Server Management en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by