Volatile GPU-Util is 0% during Neural network training
Mostrar comentarios más antiguos
Hello.
I would like to train my neural network with 4 GPUs (on a remote server)
To utilize my GPUs I set ExecutionEnvironment in the training option to 'multi-gpu'
However, the Volatile GPU-util remains at 0% during training.
It seems that the data load on the GPU memory.
I would appreciate your help.

8 comentarios
Joss Knight
el 8 de Sept. de 2023
Looks like you have 14 MATLABs running per device. That's never going to work. What code are you running? Hard to help without knowing what you were actually doing. Are they all your processes or are you using a shared machine?
486MiB is how much memory gets reserved by a process onto the device when it's selected, it's probably not your data.
기태 김
el 9 de Sept. de 2023
Editada: Walter Roberson
el 9 de Sept. de 2023
Sam Marshalik
el 10 de Sept. de 2023
Quick clarification question. You mentioned that you are using "a remote server". Are you logging into that remote server manually and running MATLAB or are you running MATLAB Parallel Server to access those remote GPUs?
기태 김
el 10 de Sept. de 2023
Joss Knight
el 10 de Sept. de 2023
Right, but is the server shared with other people? I need to explain why there are at least 25 MATLABs running on the machine. There should just be one for the client MATLAB you launched, and then 3 for the parallel pool, which should have automatically opened with three workers. Have you opened MATLAB lots of times?
Joss Knight
el 12 de Sept. de 2023
Editada: Joss Knight
el 12 de Sept. de 2023
Right, so the parfor is opening a pool with a lot of workers (presumably you have a large number of CPU cores); but unfortunately these are then not used for your preprocessing during training. You need to enable DispatchInBackground as well. Try that. You should have received a warning on the first run, telling you that most of your workers were not going to be used for training.
It does look as though the general problem is that your data preprocessing is dominating the training time meaning only a small proportion of each second is being spent computing gradients, and this is what the Utilization is measuring. If DispatchInBackground doesn't help we can explore further how to vectorize your transform functions; you might also consider using augmentedImageDatastore, which provides most of what you need. Or you could preprocess data on the GPU.
기태 김
el 14 de Sept. de 2023
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Parallel and Cloud en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!