Main Content

Deploy Smart Speaker Model on Qualcomm Hexagon DSP

This example shows you how to deploy a Simulink model designed as smart speaker system on Qualcomm® Hexagon® DSP using Embedded Coder® Support Package for Qualcomm Hexagon Processors.

Refer to Apply Speech Command Recognition Network in Smart Speaker Simulink Model (Audio Toolbox) for more information about the Simulink® model implementation.

This Simulink model comprises two main parts:

  • An audio input path, representing microphone processing. This path has preprocessing followed by speech command recognizer, which extracts auditory (Bark) spectrogram and uses a precompiled MATLAB model to predict the speech command from this audio path.

  • An audio output path, representing loudspeaker output processing.This path has preprocessing for audio playback stream and takes the speech command from the above microphone path to perform action on the playback. You make the smart speaker play music with the command "Go". You make it stop playing music by saying "Stop". You increase or decrease the music volume with the commands "Up" and "Down", respectively.

In this example, both the inputs are taken using the From Workspace Simulink block.

The HVX Instruction Set Extension (HVX ISE) includes features like SIMD, reduction instructions, and FMA, which enhance computational efficiency. SIMD, or Single Instruction Multiple Data, allows a processor to execute a single instruction on multiple data points simultaneously, thereby accelerating computations by performing numerous operations in parallel.

Reduction operations simplify a list of numbers into a single outcome by continuously applying a mathematical function, such as addition or multiplication, to pairs of numbers until only one remains. A notable form of reduction is the FMA (Fused Multiply-Add) operation. FMA efficiently combines multiplication and addition into a single step, reducing the number of operations required to achieve the final result. This is particularly beneficial in processors like Qualcomm Hexagon, where FMA boosts performance by streamlining complex calculations.

In the context of optimizing code for smart speakers, utilizing these instruction set extensions can significantly enhance processing speed. By comparing this optimized code with conventional plain-C, you can evaluate the performance improvements achieved through these advanced instructions.

Required Hardware

  • Qualcomm Hexagon Simulator

Smart Speaker Model

To open the model, execute this command:

open_system('SmartSpeakerHexagon');

Observe the Spectrum Analyzer. For execution in host simulation, click on Run in the Simulation tab. The default simulation time is set to 3 seconds. Alternatively, you can use the command:

sim('SmartSpeakerHexagon',SimulationMode='normal');

Audio output waveform

Deep Learning network classifications

Configure Model

You can configure the model using either the Interactive Approach (Configuration Parameters in Simulink) or the Programmatic Approach (MATLAB programming interface).

Interactive Approach

Configure the model for code generation on Qualcomm Hexagon Simulator in the same way as explained in Getting Started with Embedded Coder Support Package for Qualcomm Hexagon Processorsexample. Ensure that the Enable HVX option is selected to utilize the Instruction Set Extensions. Additionally, set the Code Replacement Library to None because the CRL will take precedence over the HVX ISE if both are enabled.

  • To set the HVX ISE, under Code Generation > Interface > Optimization, select HVX for Leverage target hardware instruction set extensions parameter. Also, select the parameters Optimize reductions and FMA (Fused Multiply Add) to include the specific SIMD instructions.

Programmatic Approach

  • To configure the Simulink model SmartSpeakerHexagon.slx for deployment on Qualcomm Hexagon Simulator, run these commands:

set_param('SmartSpeakerHexagon','HardwareBoard','Qualcomm Hexagon Simulator');
set_param('SmartSpeakerHexagon','SystemTargetFile','ert.tlc');
set_param('SmartSpeakerHexagon','BuildConfiguration','Faster Runs');
  • Set the code replacement library to use Instruction Set Extensions to HVX, and turn on the Optimize reductions and FMA in order to produce code that is optimized for Qualcomm Hexagon Simulator.

targetInfo = get_param('SmartSpeakerHexagon',"CoderTargetData");
targetInfo.Device.EnableHVX = 1;
targetInfo.Device.ProcessorVersion = 'V73';
set_param('SmartSpeakerHexagon','CoderTargetData',targetInfo);
set_param('SmartSpeakerHexagon','InstructionSetExtensions', 'HVX');
set_param('SmartSpeakerHexagon','OptimizeReductions','On');
set_param('SmartSpeakerHexagon','InstructionSetFMA','On');
set_param('SmartSpeakerHexagon','CodeReplacementLibrary','None');

Generate Code

You can now use the model to run the Smart Speaker model on the target.

  • Press Ctrl+B or go to Embedded Coder App and click Build.

Once the code is generated you can view the generated code by clicking View Code for HVX ISE.

Verify on Target Using SIL/PIL Manager

To perform numerical accuracy verification of the generated code against the simulation output, use the SIL/PIL Manager App.

  1. Go to SIL/PIL Manager.

  2. Set Mode to Automated Verification.

  3. Set the SIL/PIL Mode to Processor-in-loop (PIL).

  4. Click Run Verification.

Alternatively, execute this command to run the model programmatically in PIL mode.

set_param('SmartSpeakerHexagon','SimulationMode','processor-in-the-loop (pil)');
outputWithHVXISE = sim('SmartSpeakerHexagon');

You can verify the numerical accuracy using the Simulation Data Inspector. For detailed instructions, refer to the Getting Started with Embedded Coder Support Package for Qualcomm Hexagon Processorsexample.

Compare Performance

Compare the performance of a particular block with plain C code (without HVX ISE) and with HVX ISE code.

The calculation estimates the average execution time for a 1-second duration by scaling the total execution times of two step sections.The 100 calls for the SmartSpeakerHexagon_step0 and 10 calls for the SmartSpeakerHexagon_step1, are normalized based on the actual number of calls (301 and 31, respectively) during a 3-second run.

ticksWithHVXISE = ((executionProfile.Sections(2).TotalExecutionTimeInTicks) * (100/301)) + ((executionProfile.Sections(3).TotalExecutionTimeInTicks) * (10/31))

You can observe that the total execution time in ticks consumed by the step function with HVX Instruction Set Extension is 205806206 cycles. Additionally, the Code Profile Analyzer provides insights into the average execution time. Refer to Analyze Profile Information section of Getting Started with Embedded Coder Support Package for Qualcomm Hexagon Processorsexample.

Repeat the same steps without selecting a Instruction Set Extension.

set_param('SmartSpeakerHexagon','InstructionSetExtensions','None');

outputWithoutISE = sim('SmartSpeakerHexagon');

ticksWithoutISE = ((executionProfile.Sections(2).TotalExecutionTimeInTicks) * (100/301)) + ((executionProfile.Sections(3).TotalExecutionTimeInTicks) * (10/31))

The total cycles consumed by the step function without selecting an ISE or CRL is around 672907053 cycles.

Similarly, the total execution time in ticks consumed by the step function with Qualcomm Hexagon CRL added with HVX instruction set extensions is 91381631 cycles. This figure shows the performance comparison between the HVX Instruction Set Extension, Qualcomm Hexagon CRL, and Plain-C.

The HVX ISE implementation achieves a performance improvement of 3.2 times over Plain-C, while the CRL+ HVX ISE implementation is 7.3 times faster compared to Plain-C.