In a typical Convolutional Neural Networks (CNN) workflow, you start with constructing a
CNN architecture by using the Deep Learning
Toolbox™, and train the network in tandem with the Parallel
Computing Toolbox™. Alternatively, you can import a
ConvNet already trained on a
large dataset, and transfer the learned features. Transfer learning implies taking a CNN
trained for one set of classification problems and retraining it to classify a different set
of classes. Here the last few layers of the CNN are relearned. Again, Parallel
Computing Toolbox is used in the learning phase. You can also import a trained CNN network from
other frameworks like Caffe or MatConvNet into a
Once you have obtained the trained network, you can use GPU Coder™ to generate C++ or CUDA® code and deploy CNN on multiple embedded platforms that use NVIDIA® or ARM® GPU processors. The generated code implements the CNN by using the architecture,
the layers, and parameters that you specify in the input
The code generator takes advantage of NVIDIA CUDA deep neural network library (cuDNN), NVIDIA TensorRT™ high performance inference library for NVIDIA GPUs and ARM Compute Library for computer vision and machine learning for ARM Mali GPUs.
The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of NVIDIA and ARM Mali GPU platforms. For performing deep learning on ARM Mali GPU targets, you generate code on the host development computer. Then, to build and run the executable program move the generated code to the ARM target platform.