Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

Question

Eric Louchard el 17 de Jul. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to

Respondida: Eric Louchard el 21 de Jul. de 2021

I am trying this example code

Create Simple Deep Learning Network for Classification - MATLAB & Simulink Example (mathworks.com)

One thing I notice is that running the Classify function with the imageDataStore (imdsValidation)is much faster than running one image multiple times. Is this some sort of batch process that is using Matlab vectorization to speed things up or is it inherent to a CNN?

YPred = classify(net,imdsValidation);

And a related question, when doing codegen or CNNcodegen, is the resulting C++ code able to also run multiple images like this? I cannot see a way to do it with the C++ output code.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Vineet Joshi el 20 de Jul. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to#answer_750258

Hi

As you can see in the documentation page classify - MiniBatchSize, larger mini-batches can result in faster prediction but requires more memory and hence it is not something specific to the CNN network.

As for your second question, the CNNcodegen function only generates the codes for the network, how you inference it depends on your choice. You can write the code to sequencially inference the network and get the C++ code, or use other techniques like multiple workers and parallel computing to make it faster in a batch setting.

Hope this was helpful.

Thanks

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Eric Louchard el 20 de Jul. de 2021

Thanks for the reply! I have been doing some more experiments and found that making a 4D array of images takes advantage of some parallel processing system in Matlab. I have another question once I describe the findings.

~~~~~~~~~~~~~~

I made a 64-deep block of images into a 4D datacube and compared running it verses running one image 64 times and the results were the 4D datacube process was around 10x faster.

tic

for loop = 1:64

score = Fastnet_LWIR.predict(imLWIR);

end

toc

tic

score = Fastnet_LWIR.predict(im4D);

toc

Elapsed time is 0.212989 seconds.

Elapsed time is 0.023708 seconds.

So, knowing this, I tried to make a mex file using codegen and a simple prediction function and used args {ones(64,64,1,64,'uint8')} for a 64 deep 4D datacube.

cfg = coder.gpuConfig('mex');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,64,'uint8')} -report

This resulted in a mex file that took in the 4D datacube as input only, not a single frame, but had the same type of speed improvement.

However, in trying to make a C++ dll or lib, I kept getting errors so I tried cnncodegen instead and it worked, but I think it is only a single call to predict and not doing any sort of parallel processing. I tried 'batchsize', 64 in the call to cnncodegen below.

cnncodegen(Fastnet_LWIR,'targetlib','cudnn','ComputeCapability','6.1','targetparams',struct('AutoTuning',true,'DataType','FP32'),'batchsize',64,'codegenonly',1)

~~~~~~~~~~~~~~~~~ now to the question

Is there a way to call cnncodegen to write C++ code and have it work on 4D datacubes, taking advantage of prallel processing?

This is the error I got when trying to use codegen and 'lib'

cfg = coder.gpuConfig('lib');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,64,'uint8')} -report

**********************************************************************

** Visual Studio 2017 Developer Command Prompt v15.5.0

**********************************************************************

[vcvarsall.bat] Environment initialized for: 'x64'

Microsoft (R) Program Maintenance Utility Version 14.12.25830.2

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D BUILDING_TEST_CNN_PREDICTOR -D MODEL=test_CNN_predictor -D MODEL=test_CNN_predictor -o "MWElementwiseAffineLayer.obj" "D:\Fastnet_LWIR\codegen\dll\test_CNN_predictor\MWElementwiseAffineLayer.cpp"

MWElementwiseAffineLayer.cpp

C:\EngTools\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\crtdefs.h(10): fatal error C1083: Cannot open include file: 'corecrt.h': No such file or directory

NMAKE : fatal error U1077: '"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.EXE"' : return code '0x2'

Stop.

The make command returned an error of 2

Error(s) encountered while building "test_CNN_predictor":

### Failed to generate all binary outputs.

------------------------------------------------------------------------

??? Build error: C++ compiler produced errors. See the Build Log for further details.

More information

Iniciar sesión para comentar.

Answer 2

Eric Louchard el 21 de Jul. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to#answer_750818

When I test codegen, I now get this

clear cfg

cfg = coder.gpuConfig('lib');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,'uint8')} -report

Warning: Validation warning(s):

The following macro(s) in the build configuration options were not found in both the declared list of toolchain macros and

standard code-generation macros:

conlibs

If the above macro(s) is not defined at the point when the makefile is invoked, the build may fail.

> In coder.make.ToolchainInfo/validate

In coder.make.invokeBuilder

In RTW.genMakefileAndBuild (line 458)

In coder.internal.doCompile

In emcBuildRTW

In emcGenMakefileAndBuild

In emcBuildTarget

In emlcprivate

In coder.internal.compile

In emlckernel

In emlcprivate

In codegen

------------------------------------------------------------------------

**********************************************************************

** Visual Studio 2017 Developer Command Prompt v15.5.0

**********************************************************************

[vcvarsall.bat] Environment initialized for: 'x64'

Microsoft (R) Program Maintenance Utility Version 14.12.25830.2

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayer.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayer.cpp"

MWElementwiseAffineLayer.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWFusedConvReLULayer.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWFusedConvReLULayer.cpp"

MWFusedConvReLULayer.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "cnn_api.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\cnn_api.cpp"

cnn_api.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWCNNLayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWCNNLayerImpl.cu"

MWCNNLayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayerImpl.cu"

MWElementwiseAffineLayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayerImplKernel.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayerImplKernel.cu"

MWElementwiseAffineLayerImplKernel.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWFusedConvReLULayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWFusedConvReLULayerImpl.cu"

MWFusedConvReLULayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWTargetNetworkImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWTargetNetworkImpl.cu"

MWTargetNetworkImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_rtwutil.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_rtwutil.cu

Fastnet_LWIR_predict_rtwutil.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_data.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_data.cu

Fastnet_LWIR_predict_data.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_initialize.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_initialize.cu

Fastnet_LWIR_predict_initialize.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_terminate.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_terminate.cu

Fastnet_LWIR_predict_terminate.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict.cu

Fastnet_LWIR_predict.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "DeepLearningNetwork.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\DeepLearningNetwork.cu

DeepLearningNetwork.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "predict.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\predict.cu

predict.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWCudaDimUtility.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWCudaDimUtility.cu"

MWCudaDimUtility.cu

'cmd' is not recognized as an internal or external command,

operable program or batch file.

NMAKE : fatal error U1077: 'cmd' : return code '0x1'

Stop.

The make command returned an error of 2

Error(s) encountered while building "Fastnet_LWIR_predict":

### Failed to generate all binary outputs.

------------------------------------------------------------------------

??? Build error: C++ compiler produced errors. See the Build Log for further details.

More information

Code generation failed: View Error Report

Error using codegen

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos