Compatibility Matlab & GPU coder Compute Capability 8.6 RTX 3070
14 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Marco Irisarri
el 16 de Jul. de 2021
Comentada: Marco Irisarri
el 5 de Ag. de 2021
Good morning,
I recently bought an RTX 3070 and was trying to make use of it by generating CUDA code via the GPU coder. The card works, but I have noticed two things. I have Matlab 2021a, the latest nvidia drivers, all the required programs for GPU coder to work (as explained in https://es.mathworks.com/help/gpucoder/gs/install-prerequisites.html ) and the "coder.checkGpuInstall" command shows the following (see attached .txt).
(i) When running GPU bench, the results seem to indicate that the single precision TFLOPS are about half of the cards theoretical value (please see figure enclosed). In contrast, other third party tools like CUDA-Z (also below) show that the card has about 22 TFLOPS. Does this mean that Matlab is currently using half the CUDA cores per SM? Am I missing something obvious?
Figure 1: Matlab's GPU bench results
Figure 2: Cuda-Z results
(ii) I was trying to profile the GPU coder generated code by following the steps in https://es.mathworks.com/help/gpucoder/ug/gpucoder-execution-profiling-report.html (in fact, by running the code in "C:\ (...) \Documents\MATLAB\Examples\R2021a\gpucoder\GPUExecutionProfilingOfTheGeneratedCodeExample") and I am getting the following error message:
"
Error using gpucoder.profile (line 41)
Error setting property 'ComputeCapability' of class 'GpuConfig': Invalid value '8.6'.
Allowed values are:
3.2, 3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.2, 7.5, 8.0
"
Would this mean again that the compute capability 8.6 is yet not supported?
I have tried downloading the Matlab 2021b prerelease but unfortuntaely it does not install properly (the files for matlab are in the directory but the launcher does not appear anywhere. When I launch the .exe within the files I get an error (unfortunately I don't have it now to show you)).
Thank you in advance for your help, I hope my question was clear and concise. This is my first question so feedback on how to improve is very welcome.
Best,
0 comentarios
Respuesta aceptada
Nathan Malimban
el 19 de Jul. de 2021
Hi Marco,
I can address the second part of your question for now.
Yes, MATLAB R2021a does not support CC 8.6 (because the CUDA version supported by R2021a does not support CC 8.6.) It seems like the error occurs because, in the absence of a user-specified CC, the profiler code picks up the default CC from the machine. This may not be the correct default value to choose, however; I can create an internal report so we can look into this further. In the meantime, could you try the following workaround? It specifies the CC explicitly.
gpucoder.profile(designFileName, inputs, 'GpuConfigurationOptions', {'ComputeCapability', '8.0'});
designFileName is the name of your design file, and inputs is a cell array of inputs, as per the example you reference.
3 comentarios
Nathan Malimban
el 20 de Jul. de 2021
Thanks for including the error here. From this, it seems that even though the CC is set explicitly, it picks up the default and errors anyway. I will include this detail in the internal report so we can look into it. Unfortunately this means that there isn't a workaround currently for this.
It seems that CUDA 11.1 - 11.4 all support CC 8.6. So as soon as MATLAB supports an upgraded CUDA version, CC 8.6 should also be supported. MATLAB upgrades its CUDA version support regularly, but I'm not sure if it will happen as soon as 21b.
Más respuestas (1)
Joss Knight
el 22 de Jul. de 2021
Regarding the gpuBench results: no, MATLAB is definitely not only using half the cores! What you are seeing is the raw performance of SGEMM in NVIDIA's cublas library in CUDA 11.0. My understanding is that on compute capability 8.6 devices, cublas is still undergoing considerable optimisation; and indeed we see that confirmed with some improvements when upgrading to CUDA 11.2 (for which you'll have to wait until next year).
However, the performance of MTIMES still does not reach the theoretical maximum and perhaps it never will. If you click on your result for single precision MTIMES in the gpuBench report it will take you to the graph and you'll see that the performance peaks and flattens out at a certain matrix size. It may be that on these devices memory bandwidth starts to become more of a bottleneck for larger sizes. In CUDA-Z the benchmark no doubt simply runs floating point operations inside a kernel without any input or output data at all. This is great for testing raw compute power, not so useful for working out how fast the card is at doing something genuinely useful.
We're going to continue investigating this to see if we can get any more information on why cublas performance isn't as good as expected for these cards, and whether there is anything you can do.
Ver también
Categorías
Más información sobre GPU Computing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!