Problems using NVIDIA GeForce RTX 3090 for Deep Learning

10 visualizaciones (últimos 30 días)
Julius Å
Julius Å el 7 de Dic. de 2020
Comentada: Roland Kruse el 4 de Feb. de 2021
I am having trouble using NVIDIA GeForce RTX 3090 cards for training neural networks with the Deep Learning Toolbox in MATLAB. The problems arise both as error messages and strange behaviour during the training processes of several different CNNs using two different MATLAB releases.
When using MATLAB R2020b, the following error i given when trying to start a training process for a CNN:
Error using trainNetwork (line 183)
GPU support for convolutional neural networks requires a GPU device with compute capability 3.0 or higher.
When switching to using MATLAB R2019a, the following error message occurs when e.g. training a CNN for segmentation using 256x256 image data as input and a batch size of 30 (as well as in several other cases with other types of data):
Error using trainNetwork (line 165)
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.
When switching to using smaller batch sizes for this particular training process (to batch sizes 10 and 20), the training process shows a strange behaviour, with a slowly decreasing but almost static loss value (see the two images below from two completely different training processes for two different CNNs). This similar behaviour could be observed for different segmentation tasks using different data and different CNN architectures. When instead using NVIDIA TITAN RTX cards to perform these training processes, they were executed without problems, showing no similarity to each other.
According to https://se.mathworks.com/matlabcentral/answers/631134-rtx-3080-recompiling-issue-in-matlab-2020a#comment_1173538, incorrect behaviour has been observed with the new NVIDIA cards that use the Ampere architecture, especially when training CNNs. However, no workaround solution for the incorrect behaviour is mentioned.
Is there currently any known solution to these problems?

Respuestas (3)

Stephan
Stephan el 7 de Dic. de 2020
  2 comentarios
Julius Å
Julius Å el 7 de Dic. de 2020
Thank you Stephan.
Unfortunately, I was aware of this answer and should have included it in the question to avoid getting directed towards it. I was wondering if any progress had been done on this issue since this answer.
Stephan
Stephan el 7 de Dic. de 2020
Editada: Stephan el 7 de Dic. de 2020
Since the answer is from Sep 2020 and there is the workaround and also the note that it will be available in future releases i dont think so. If, then you should update your release as soon as an update is available and have a look to the release notes of the corresponding update:
But i think this is what will happen:

Iniciar sesión para comentar.


Walter Adame Gonzalez
Walter Adame Gonzalez el 13 de Dic. de 2020
Hello Julius!
I have also tried to run a CNN training on my rtx 3090 gpu using MatLab 2020b with my own 256x256x3 dataset. It shows exactly the same behavior than what you are reporting (plateau almost immediately at 80% accuracy for my validation images) and sometimes at the end of the training there is a validation accuracy drop to around 60% (only on the last validation accuracy calculation).
Also tried to run the training on 2020a release and 2019b release with the same abnormal outcome. Running the training on a MX150 from NVIDIA and also on my cpu (core i7 10700) shows a normal behavior. I've implemented a code to run the training on python (since I got tired of failing) on my 3090. Just let me know if you would like me to share my python code with you.
Best,
Walter

Walter Adame Gonzalez
Walter Adame Gonzalez el 17 de Dic. de 2020
Good News!
I received MatLab 2021a pre-release version and it works now. No backwards compatibility needed, no compiling problems. works smoothly. Test ran on rtx 3090, drivers up to date december 16th. Good luck!
  2 comentarios
M J
M J el 19 de Dic. de 2020
So it's official with the 2021a version? Training networks works fine with the rtx30 series?
Roland Kruse
Roland Kruse el 4 de Feb. de 2021
Training CNNs with R2021a on RTX 30xx works well with me, too, much unlike with R2020b and forward compatibility. Predict, however, does not work reliably, I get out-of-memory errors.

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel and Cloud en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by