dlquantizer
Description
Use the dlquantizer
object to reduce the memory requirement of a
deep neural network by quantizing weights, biases, and activations to 8-bit scaled integer
data types. You can create and verify the behavior of a quantized network for GPU, FPGA, CPU
deployment, or explore the quantized network in MATLAB®.
For CPU and GPU deployment, the software generates code for a convolutional deep neural
network by quantizing the weights, biases, and activations of the convolution layers to 8-bit
scaled integer data types. The quantization is performed by providing the calibration result
file produced by the calibrate
function
to the codegen
(MATLAB Coder) command.
Code generation does not support quantized deep neural networks produced by the quantize
function.
This object requires Deep Learning Toolbox Model Quantization Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.
Creation
Description
Input Arguments
net
— Pretrained neural network
DAGNetwork
object | dlnetwork
object | SeriesNetwork
object | yolov2ObjectDetector
object | yolov3ObjectDetector
object | yolov4ObjectDetector
object | ssdObjectDetector
object
Pretrained neural network, specified as a DAGNetwork
,
dlnetwork
, SeriesNetwork
, yolov2ObjectDetector
(Computer Vision Toolbox), yolov3ObjectDetector
(Computer Vision Toolbox), yolov4ObjectDetector
(Computer Vision Toolbox), or a ssdObjectDetector
(Computer Vision Toolbox) object.
Properties
NetworkObject
— Pretrained neural network
DAGNetwork
object | dlnetwork
object | SeriesNetwork
object | yolov2ObjectDetector
object | yolov3ObjectDetector
| yolov4ObjectDetector
| ssdObjectDetector
object
This property is read-only.
Pre-trained neural network, specified as a DAGNetwork
,
dlnetwork
, SeriesNetwork
,
yolov2ObjectDetector
(Computer Vision Toolbox), yolov3ObjectDetector
(Computer Vision Toolbox), yolov4ObjectDetector
(Computer Vision Toolbox), or a ssdObjectDetector
(Computer Vision Toolbox) object.
ExecutionEnvironment
— Execution environment
'GPU'
(default) | 'FPGA'
| 'CPU'
| 'MATLAB'
Execution environment for the quantized network, specified as
'GPU'
, 'FPGA'
, 'CPU'
, or
'MATLAB'
. How the network is quantized depends on the choice of
execution environment.
The 'MATLAB'
execution environment indicates a target-agnostic
quantization of the neural network will be performed. This option does not require you
to have target hardware in order to explore the quantized network in MATLAB.
Example: 'ExecutionEnvironment'
,'FPGA'
Simulation
— Validate behavior of network quantized for FPGA environment using simulation
'off'
(default) | 'on'
Whether to validate behavior of network quantized for FPGA using simulation, specified as one of these values:
'on'
— Validate the behavior of the quantized network by simulating the quantized network in MATLAB and comparing the prediction results of the original single-precision network to the simulated prediction results of the quantized network.'off'
— Generate code and validate the behavior of the quantized network on the target hardware.
Note
This option is only valid when ExecutionEnvironment is set to 'FPGA'
.
Note
Alternatively, you can use the quantize
method to create a simulatable quantized network. The simulatable quantized network
enables visibility of the quantized layers, weights, and biases of the network, as
well as simulatable quantized inference behavior.
Example: 'Simulation'
, 'on'
Object Functions
Examples
Quantize a Neural Network for GPU Target
This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images according to the Train Deep Learning Network to Classify New Images example. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.
Load the pretrained network. net
is the output network of the Train Deep Learning Network to Classify New Images example.
load squeezenetmerch
net
net = DAGNetwork with properties: Layers: [68×1 nnet.cnn.layer.Layer] Connections: [75×2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
In this example, use the images in the MerchData
data set. Define an augmentedImageDatastore
object to resize the data for the network. Then, split the data into calibration and validation data sets.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);
Create a dlquantizer
object and specify the network to quantize.
quantObj = dlquantizer(net);
Define a metric function to use to compare the behavior of the network before and after quantization. This example uses the hComputeModelAccuracy
metric function.
function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore) %% Computes model-level accuracy statistics % Load ground truth tmp = readall(dataStore); groundTruth = tmp.response; % Compare with predicted label with actual ground truth predictionError = {}; for idx=1:numel(groundTruth) [~, idy] = max(predictionScores(idx,:)); yActual = net.Layers(end).Classes(idy); predictionError{end+1} = (yActual == groundTruth(idx)); %#ok end % Sum all prediction errors. predictionError = [predictionError{:}]; accuracy = sum(predictionError)/numel(predictionError); end
Specify the metric function in a dlquantizationOptions
object.
quantOpts = dlquantizationOptions('MetricFcn',{@(x)hComputeModelAccuracy(x, net, aug_valData)});
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(quantObj, aug_calData)
calResults=121×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ ____________________ ________________________ _________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.91985 0.88489
{'conv1_Bias' } {'conv1' } "Bias" -0.07925 0.26343
{'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
{'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
{'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.7406 0.90982
{'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.060056 0.14602
{'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.74397 0.66905
{'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.051778 0.074239
{'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
{'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
{'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.72035 0.9743
{'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.067029 0.30425
{'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.61443 0.7741
{'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.053613 0.10329
{'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
{'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the validate
function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions
object to compare the results of the network before and after quantization.
valResults = validate(quantObj, aug_valData, quantOpts)
valResults = struct with fields:
NumSamples: 20
MetricResults: [1×1 struct]
Statistics: [2×2 table]
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
valResults.Statistics
ans=2×2 table
NetworkImplementation LearnableParameterMemory(bytes)
_____________________ _______________________________
{'Floating-Point'} 2.9003e+06
{'Quantized' } 7.3393e+05
In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.
The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.
Quantize a Neural Network for CPU Target
This example uses:
- Deep Learning ToolboxDeep Learning Toolbox
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
- MATLAB CoderMATLAB Coder
- MATLAB Support Package for Raspberry Pi HardwareMATLAB Support Package for Raspberry Pi Hardware
- Embedded CoderEmbedded Coder
- MATLAB Coder Interface for Deep LearningMATLAB Coder Interface for Deep Learning
This example shows how to quantize and validate a neural network for a CPU target. This workflow is similar to other execution environments, but before validating you must establish a raspi
connection and specify it as target using dlquantizationOptions
.
First, load your network. This example uses the pretrained network squeezenet
.
load squeezenetmerch
net
net = DAGNetwork with properties: Layers: [68×1 nnet.cnn.layer.Layer] Connections: [75×2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}
Then define your calibration and validation data, calDS
and valDS
respectively.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227],calData); aug_valData = augmentedImageDatastore([227 227],valData);
Create the dlquantizer
object and specify a CPU execution environment.
dq = dlquantizer(net,'ExecutionEnvironment','CPU')
dq = dlquantizer with properties: NetworkObject: [1×1 DAGNetwork] ExecutionEnvironment: 'CPU'
Calibrate the network.
calResults = calibrate(dq,aug_calData,'UseGPU','off')
calResults=122×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ ____________________ ________________________ _________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.91985 0.88489
{'conv1_Bias' } {'conv1' } "Bias" -0.07925 0.26343
{'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.38 1.2477
{'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273
{'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.7406 0.90982
{'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.060056 0.14602
{'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.74397 0.66905
{'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.051778 0.074239
{'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917
{'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675
{'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.72035 0.9743
{'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.067029 0.30425
{'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.61443 0.7741
{'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.053613 0.10329
{'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877
{'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881
⋮
Use the MATLAB Support Package for Raspberry Pi Hardware function, raspi
, to create a connection to the Raspberry Pi. In the following code, replace:
raspiname
with the name or address of your Raspberry Piusername
with your user namepassword
with your password
% r = raspi('raspiname','username','password')
For example,
r = raspi('gpucoder-raspberrypi-7','pi','matlab')
r = raspi with properties: DeviceAddress: 'gpucoder-raspberrypi-7' Port: 18734 BoardName: 'Raspberry Pi 3 Model B+' AvailableLEDs: {'led0'} AvailableDigitalPins: [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27] AvailableSPIChannels: {} AvailableI2CBuses: {} AvailableWebcams: {} I2CBusSpeed: AvailableCANInterfaces: {} Supported peripherals
Specify raspi
object as the target for the quantized network.
opts = dlquantizationOptions('Target',r)
opts = dlquantizationOptions with properties: MetricFcn: {} Bitstream: '' Target: [1×1 raspi]
Validate the quantized network with the validate
function.
valResults = validate(dq,aug_valData,opts)
### Starting application: 'codegen\lib\validate_predict_int8\pil\validate_predict_int8.elf' To terminate execution: clear validate_predict_int8_pil ### Launching application validate_predict_int8.elf... ### Host application produced the following standard output (stdout) and standard error (stderr) messages:
valResults = struct with fields:
NumSamples: 20
MetricResults: [1×1 struct]
Statistics: []
Examine the validation output to see the performance of the quantized network.
valResults.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 0.95
{'Quantized' } 0.95
Quantize Network for FPGA Deployment
This example uses:
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Deep Learning ToolboxDeep Learning Toolbox
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
- MATLAB Coder Interface for Deep LearningMATLAB Coder Interface for Deep Learning
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8
network to a target FPGA board.
For this example, you need:
Deep Learning Toolbox ™
Deep Learning HDL Toolbox ™
Deep Learning Toolbox Model Quantization Library
Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
MATLAB Coder Interface for Deep Learning.
Load Pretrained Network
Load the pretrained LogoNet network and analyze the network architecture.
snet = getLogoNetwork; deepNetworkDesigner(snet);
Load Data
This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore
object for calibration and validation. Expedite calibration and validation by reducing the calibration data set to 20 images. The MATLAB simulation workflow has a maximum limit of five images when validating the quantized network. Reduce the validation data set sizes to five images. The FPGA validation workflow has a maximum limit of one image when validating the quantized network. Reduce the FPGA validation data set to a single image.
curDir = pwd; unzip("logos_dataset.zip"); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized'); calibrationData_reduced = calibrationData.subset(1:20); validationData_simulation = validationData.subset(1:5); validationData_FPGA = validationData.subset(1:1);
Generate Calibration Result File for the Network
Create a dlquantizer
object and specify the network to quantize. Specify the execution environment as FPGA.
dlQuantObj_simulation = dlquantizer(snet,'ExecutionEnvironment',"FPGA",'Simulation','on'); dlQuantObj_FPGA = dlquantizer(snet,'ExecutionEnvironment',"FPGA");
Use the calibrate
function to exercise the network with sample inputs and collect the range information. The calibrate
function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObj_simulation,calibrationData_reduced)
ans=35×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 193.72
⋮
calibrate(dlQuantObj_FPGA,calibrationData_reduced)
ans=35×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 193.72
⋮
Create Target Object
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx™ Vivado™ Design Suite 2020.2. To set the Xilinx Vivado toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat');
To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Alternatively, you can also use the JTAG interface.
% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');
Create dlQuantizationOptions
Object
Create a dlquantizationOptions
object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.
options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget); options_simulation = dlquantizationOptions;
To use a custom metric function, specify the metric function in the dlquantizationOptions
object.
options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_FPGA)},'Bitstream','zcu102_int8','Target',hTarget); options_simulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_simulation)})
Validate Quantized Network
Use the validate
function to quantize the learnable parameters in the convolution layers of the network. The validate
function simulates the quantized network in MATLAB. The validate
function uses the metric function defined in the dlquantizationOptions
object to compare the results of the single-data-type network object to the results of the quantized network object.
prediction_simulation = dlQuantObj_simulation.validate(validationData_simulation,options_simulation)
Compiling leg: conv_1>>relu_4 ... Compiling leg: conv_1>>relu_4 ... complete. Compiling leg: maxpool_4 ... Compiling leg: maxpool_4 ... complete. Compiling leg: fc_1>>fc_3 ... Compiling leg: fc_1>>fc_3 ... complete.
prediction_simulation = struct with fields:
NumSamples: 5
MetricResults: [1×1 struct]
Statistics: []
For validation on an FPGA, the validate function:
Programs the FPGA board by using the output of the
compile
method and the programming fileDownloads the network weights and biases
Compares the performance of the network before and after quantization
prediction_FPGA = dlQuantObj_FPGA.validate(validationData_FPGA,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer) 2 'conv_1' Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'maxpool_3' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 11 'conv_4' Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 12 'relu_4' ReLU ReLU (HW Layer) 13 'maxpool_4' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer) 15 'relu_5' ReLU ReLU (HW Layer) 16 'dropout_1' Dropout 50% dropout (HW Layer) 17 'fc_2' Fully Connected 2048 fully connected layer (HW Layer) 18 'relu_6' ReLU ReLU (HW Layer) 19 'dropout_2' Dropout 50% dropout (HW Layer) 20 'fc_3' Fully Connected 32 fully connected layer (HW Layer) 21 'softmax' Softmax softmax (HW Layer) 22 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>relu_4 ... ### Compiling layer group: conv_1>>relu_4 ... complete. ### Compiling layer group: maxpool_4 ... ### Compiling layer group: maxpool_4 ... complete. ### Compiling layer group: fc_1>>fc_3 ... ### Compiling layer group: fc_1>>fc_3 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "12.0 MB" "OutputResultOffset" "0x00c00000" "4.0 MB" "SchedulerDataOffset" "0x01000000" "4.0 MB" "SystemBufferOffset" "0x01400000" "36.0 MB" "InstructionDataOffset" "0x03800000" "8.0 MB" "ConvWeightDataOffset" "0x04000000" "12.0 MB" "FCWeightDataOffset" "0x04c00000" "12.0 MB" "EndOffset" "0x05800000" "Total: 88.0 MB" ### Network compilation complete. ### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA. ### Finished writing input activations. ### Running single input activation. Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 248358 274080 90.62 DSPs 384 2520 15.24 Block RAM 581 912 63.71 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 40142478 0.18247 1 40142478 5.5 ____imageinput_norm 216472 0.00098 ____conv_1 6825671 0.03103 ____maxpool_1 3755088 0.01707 ____conv_2 10440701 0.04746 ____maxpool_2 1447840 0.00658 ____conv_3 9405685 0.04275 ____maxpool_3 1765856 0.00803 ____conv_4 1819636 0.00827 ____maxpool_4 28098 0.00013 ____fc_1 2651288 0.01205 ____fc_2 1696632 0.00771 ____fc_3 89511 0.00041 * The clock frequency of the DL processor is: 220MHz Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 168645 274080 61.53 DSPs 800 2520 31.75 Block RAM 453 912 49.67 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Finished writing input activations. ### Running single input activation.
prediction_FPGA = struct with fields:
NumSamples: 1
MetricResults: [1×1 struct]
Statistics: [2×7 table]
View Performance of Quantized Neural Network
Display the accuracy of the quantized network.
prediction_simulation.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
prediction_FPGA.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
Display the performance of the quantized network in frames per second.
prediction_FPGA.Statistics.FramesPerSecond(2)
ans = 19.0828
Import a dlquantizer
Object into the Deep Network Quantizer App
This example shows you how to import a dlquantizer
object from the base workspace into the Deep Network Quantizer app. This allows you to begin quantization of a deep neural network using the command line or the app, and resume your work later in the app.
Open the Deep Network Quantizer app.
deepNetworkQuantizer
In the app, click New and select Import dlquantizer object
.
In the dialog, select the dlquantizer
object to import from the
base workspace. For this example, use quantObj
that you create in
the above example Quantize a Neural Network for GPU Target.
The app imports any data contained in the dlquantizer
object that
was collected at the command line. This data can include the network to quantize,
calibration data, validation data, and calibration statistics.
The app displays a table containing the calibration data contained in the imported
dlquantizer
object, quantObj
. To the right
of the table, the app displays histograms of the dynamic ranges of the parameters.
The gray regions of the histograms indicate data that cannot be represented by the
quantized representation. For more information on how to interpret these histograms,
see Quantization of Deep Neural Networks.
Emulate Target Agnostic Quantized Network
This example shows how to create a target agnostic, simulatable quantized deep neural network in MATLAB.
Target agnostic quantization allows you to see the effect quantization has on your neural network without target hardware or target-specific quantization schemes. Creating a target agnostic quantized network is useful if you:
Do not have access to your target hardware.
Want to preview whether or not your network is suitable for quantization.
Want to find layers that are sensitive to quantization.
Quantized networks emulate quantized behavior for quantization-compatible layers. Network architecture like layers and connections are the same as the original network, but inference behavior uses limited precision types. Once you have quantized your network, you can use the quantizationDetails function to retrieve details on what was quantized.
Load the pretrained network. net
is a SqueezeNet network that has been retrained using transfer learning to classify images in the MerchData
data set.
load squeezenetmerch
net
net = DAGNetwork with properties: Layers: [68×1 nnet.cnn.layer.Layer] Connections: [75×2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}
You can use the quantizationDetails
function to see that the network is not quantized.
qDetailsOriginal = quantizationDetails(net)
qDetailsOriginal = struct with fields:
IsQuantized: 0
TargetLibrary: ""
QuantizedLayerNames: [0×0 string]
QuantizedLearnables: [0×3 table]
Unzip and load the MerchData
images as an image datastore.
unzip('MerchData.zip') imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames');
Define calibration and validation data to use for quantization. The output size of the images are changed for both calibration and validation data according to network requirements.
[calData,valData] = splitEachLabel(imds,0.7,'randomized');
augCalData = augmentedImageDatastore([227 227],calData);
augValData = augmentedImageDatastore([227 227],valData);
Create dlquantizer
object and specify the network to quantize. Set the execution environment to MATLAB. How the network is quantized depends on the execution environment. The MATLAB execution environment is agnostic to the target hardware and allows you to prototype quantized behavior.
quantObj = dlquantizer(net,'ExecutionEnvironment','MATLAB');
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(quantObj,augCalData);
Use the quantize
method to quantize the network object and return a simulatable quantized network.
qNet = quantize(quantObj)
qNet = Quantized DAGNetwork with properties: Layers: [68×1 nnet.cnn.layer.Layer] Connections: [75×2 table] InputNames: {'data'} OutputNames: {'new_classoutput'} Use the quantizationDetails method to extract quantization details.
You can use the quantizationDetails
function to see that the network is now quantized.
qDetailsQuantized = quantizationDetails(qNet)
qDetailsQuantized = struct with fields:
IsQuantized: 1
TargetLibrary: "none"
QuantizedLayerNames: [26×1 string]
QuantizedLearnables: [52×3 table]
Make predictions using the original, single-precision floating-point network, and the quantized INT8 network.
predOriginal = classify(net,augValData); % Predictions for the non-quantized network predQuantized = classify(qNet,augValData); % Predictions for the quantized network
Compute the relative accuracy of the quantized network as compared to the original network.
ccrQuantized = mean(predQuantized == valData.Labels)*100
ccrQuantized = 100
ccrOriginal = mean(predOriginal == valData.Labels)*100
ccrOriginal = 100
For this validation data set, the quantized network gives the same predictions as the floating-point network.
Version History
Introduced in R2020aR2023a: dlnetwork
support
dlquantizer
now supports dlnetwork
objects for
quantization using the quantize
function.
R2023a: Quantize yolov3ObjectDetector
and yolov4ObjectDetector
using dlquantizer
yolov3ObjectDetector
(Computer Vision Toolbox) and yolov4ObjectDetector
(Computer Vision Toolbox) objects can now be quantized using dlquantizer
.
R2022b: dlnetwork
support
dlquantizer
now supports dlnetwork
objects for
quantization using the calibrate
and validate
functions.
R2022a: Validate the performance of quantized network for CPU target
You can now use the dlquantizer
object and the
validate
function to quantize a network and generate code for CPU
targets.
R2022a: Quantize neural networks without a specific target
Specify MATLAB
as the ExecutionEnvironment
to
quantize your neural networks without generating code or committing to a specific target for
code deployment. This can be useful if you:
Do not have access to your target hardware.
Want to inspect your quantized network without generating code.
Your quantized network implements int8
data instead of
single
data. It keeps the same layers and connections as the original
network, and it has the same inference behavior as it would when running on hardware.
Once you have quantized your network, you can use the
quantizationDetails
function to inspect your quantized network.
Additionally, you also have the option to deploy the code to a GPU target.
See Also
Apps
Functions
calibrate
|quantize
|validate
|dlquantizationOptions
|quantizationDetails
|estimateNetworkMetrics
Topics
- Quantization of Deep Neural Networks
- Quantize Residual Network Trained for Image Classification and Generate CUDA Code
- Quantize Layers in Object Detectors and Generate CUDA Code
- Quantize Network for FPGA Deployment
- Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
- Parameter Pruning and Quantization of Image Classification Network
Abrir ejemplo
Tiene una versión modificada de este ejemplo. ¿Desea abrir este ejemplo con sus modificaciones?
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)