Deploy Smart Speaker System on Raspberry Pi Using Simulink
This example demonstrates how to deploy a smart speaker system on Raspberry Pi® using Simulink®. A smart speaker is a speaker that can be controlled by your voice. You run the smart speaker Simulink model on Raspberry Pi in External Mode
. The voice commands are captured through the USB microphone connected to your Raspberry Pi board. You can optionally input voice commands through the pre-recorded files. The smart speaker model plays the audio on the speaker connected to the Raspberry Pi. You make the smart speaker play music with the command "Go". You make it stop playing music by saying "Stop". You increase or decrease the music volume with the commands "Up" and "Down", respectively. For details about modeling the various modules used in the smart speaker model, see Model Smart Speaker in Simulink.
Smart Speaker Model
The model can be divided into four sub-modules that perform four sub-tasks
Capture 16-bit speech samples and convert them to single precision format in the range [-1,1)
Recognize speech commands
Prepare audio frame based on the recognized speech commands
Convert audio samples to 16-bit signed integer format and play the audio on Raspberry Pi
modelName = "AudioSmartSpeakerOnRaspberryPi";
open_system(modelName)
Configure Audio I/O Blocks
The smart speaker model uses the ALSA Audio Capture (Simulink) block to capture the voice commands from a microphone connected to your Raspberry Pi board. The model uses the ALSA Audio Playback (Simulink) block to play the audio on a speaker connected to your Raspberry Pi board. The ALSA Audio IO blocks come with Simulink Support Package for Raspberry Pi Hardware. After connecting the microphone and speaker to your Raspberry Pi board, you list the audio capture and audio playback devices using
.listAudioDevices
(Simulink)
r = raspi("raspiname","pi","password"); audioCaptureDevicesList = listAudioDevices(r,"capture"); audioPlaybackDevicesList = listAudioDevices(r,"playback");
You set the Device name in the ALSA Audio Capture:Block Parameters dialog to the device of your choice from audioCaptureDevicesList
. Similarly, you configure the Device name in the ALSA Audio Playback:Block Parameters dialog to the playback device of your choice from audioPlaybackDevicesList
.
Display the details of an audio capture and audio playback device from audioCaptureDevicesList
and audioPlaybackDevicesList
.
audioCaptureDevicesList(1)
ans =
struct with fields:
Name: 'USB-Audio-LogitechUSBHeadsetH340-LogitechInc.LogitechUSBHeadsetH340atusb-0000:01:00.0-1.2,fullspeed' Device: '2,0' Channels: {'2'} BitDepth: {'16-bit integer'} SamplingRate: {'44100'}
audioPlaybackDevicesList(3)
ans =
struct with fields:
Name: 'USB-Audio-LogitechUSBHeadsetH340-LogitechInc.LogitechUSBHeadsetH340atusb-0000:01:00.0-1.2,fullspeed' Device: '2,0' Channels: {'2'} BitDepth: {'16-bit integer'} SamplingRate: {'44100'}
To use the above devices, you set the Device name in the ALSA Audio Capture:Block Parameters and ALSA Audio Capture:Block Parameters dialog to plughw:2,0
. You set the Audio sampling frequency (Hz) to 16000
as the subsequent convolutional neural network (CNN) used to recognize voice commands was trained on a 16000
Hz sampling frequency.
The model provides a manual switch to switch audio from microphone to the pre-recorded audio files. You select the voice commands using the Rotary switch. The model uses four Audio File Read (Simulink) blocks to read the audio files go.wav, stop.wav, up.wav
, and down.wav
. Note that Audio File Read (Simulink) block is included in Simulink Support Package for Raspberry Pi Hardware.
Modify the Data Type of the Audio Samples
ALSA Audio Capture (Simulink) and Audio File Read (Simulink) blocks outputs 16-bit signed integers audio samples with values in the interval of . You cast the output of these blocks output to single-precision data and multiply it by to change the numerical range to . Note that you are changing the numerical range because the subsequent blocks expect the audio in the range .
The ALSA Audio Playback (Simulink) block expects 16-bit signed integers as input, hence the output of the preceding block that prepares audio frame must be converted to 16-bit signed integers. The range of the floating-point audio frame samples is . You multiply the floating-point audio frame samples by to bring the range to . After multiplying, you typecast the product to int16
data type. These int16
audio frame samples can be fed to ALSA Audio Playback (Simulink) block. The AudioSmartSpeakerOnRaspberryPi
model uses Gain (Simulink) block to multiply the audio samples by the constants or . It uses Data Type Conversion (Simulink) block to typecast the audio samples to single
or int16
.
Configure Smart Speaker Model Settings and Run the Model in External Mode
Open the AudioSmartSpeakerOnRaspberryPi
model, go to MODELING Tab and Click on Model Settings or press Ctrl+E to open the configuration parameters dialog.
Select a solver that supports code generation. Set Solver to
auto (Automatic solver selection)
and Solver type toFixed-step
.Select Code Generation and set the System Target File to
ert.tlc
whose Description isEmbedded Coder
.Set the Language to
C++
, which will automatically set the Language Standard toC++11 (ISO)
.In Configuration > Hardware Implementation, set the Hardware board to
Raspberry Pi
and enter your Raspberry Pi credentials in the Board Parameters.In the same window, set External mode > Communication interface to
XCP on TCP/IP
.Check Signal logging in Configuration > Data Import/Export to enable signal monitoring in
External Mode
.Go to the Hardware tab and click on Monitor & Tune to run the model in external mode.