enhanceSpeech

Enhance speech signal

Since R2024a

collapse all in page

Syntax

audioOut = enhanceSpeech(audioIn,fs)

enhanceSpeech(audioIn,fs)

Description

audioOut = enhanceSpeech(audioIn,fs) enhances the speech in the audio signal by reducing non-speech sounds.

example

enhanceSpeech(audioIn,fs) with no output arguments displays a plot of the original and enhanced speech.

This function requires both Audio Toolbox™ and Deep Learning Toolbox™.

example

Examples

collapse all

Download `enhanceSpeech` Functionality

This example uses:

Open Live Script

Try calling enhanceSpeech in the command line. If the required model files are not installed, then the function throws an error and provides a link to download them. Click the link, and unzip the file to a location on the MATLAB path.

Alternatively, execute the following commands to download and unzip the enhanceSpeech model files to your temporary directory.

downloadFolder = fullfile(tempdir,"enhanceSpeechDownload");
loc = websave(downloadFolder,"https://ssd.mathworks.com/supportfiles/audio/enhanceSpeech.zip");
modelsLocation = tempdir;
unzip(loc,modelsLocation)
addpath(fullfile(modelsLocation,"enhanceSpeech"))

Enhance Speech Signal

This example uses:

Open Live Script

Read in an audio file containing speech and noise. Listen to the signal.

[noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
sound(noisySpeech,fs)

Use enhanceSpeech to reduce the non-speech sounds in the signal. Listen to the enhanced signal.

enhancedSpeech = enhanceSpeech(noisySpeech,fs);
sound(enhancedSpeech,fs)

Call enhanceSpeech with no output arguments to plot both the noisy signal and the enhanced signal.

enhanceSpeech(noisySpeech,fs);

Use STOI to Evaluate Enhanced Speech Signal

This example uses:

Open Live Script

Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

[noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
reference = audioread("CleanSpeech-16-mono-3secs.ogg");

Calculate the STOI metric for the noisy speech signal using stoi.

noisySpeechSTOI = stoi(noisySpeech,reference,fs)

noisySpeechSTOI = 0.8370

Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the STOI metric and see the improvement compared to the STOI of the noisy signal.

enhancedSpeech = enhanceSpeech(noisySpeech,fs);
enhancedSpeechSTOI = stoi(enhancedSpeech,reference,fs)

enhancedSpeechSTOI = single
    0.8808

Use ViSQOL to Evaluate Enhanced Speech Signal

This example uses:

Open Live Script

Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

[noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
reference = audioread("CleanSpeech-16-mono-3secs.ogg");

Calculate the ViSQOL metric for the noisy speech signal using visqol.

noisySpeechMOS = visqol(noisySpeech,reference,fs,Mode="speech")

noisySpeechMOS = 2.9550

Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the ViSQOL metric and see the improvement compared to the noisy signal.

enhancedSpeech = enhanceSpeech(noisySpeech,fs);
enhancedSpeechMOS = visqol(enhancedSpeech,reference,fs,Mode="speech")

enhancedSpeechMOS = single
    3.2205

Input Arguments

collapse all

`audioIn` — Audio input
column vector

Audio input containing the speech signal to enhance, specified as a column vector (single channel).

Data Types: single | double

`fs` — Sample rate (Hz)
positive scalar

Sample rate in Hz, specified as a positive scalar. The enhanceSpeech function requires a sample rate of at least 4000 Hz.

Data Types: single | double

Output Arguments

collapse all

`audioOut` — Audio output
column vector

Audio output containing the enhanced speech signal, returned as a column vector with the same size and sample rate as the input signal.

Data Types: single

Algorithms

The enhanceSpeech function uses a pretrained MetricGAN-OKD [1] neural network to enhance speech signals.

References

[1] Shin, Wooseok, Byung Hoon Lee, Jin Sob Kim, Hyun Joon Park, and Sung Won Han. "MetricGAN-OKD: multi-metric optimization of MetricGAN via online knowledge distillation for speech enhancement." In International Conference on Machine Learning, pp. 31521-31538. PMLR, 2023.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

Introduced in R2024a

enhanceSpeech

Syntax

Description

Examples

Download enhanceSpeech Functionality

Enhance Speech Signal

Use STOI to Evaluate Enhanced Speech Signal

Use ViSQOL to Evaluate Enhanced Speech Signal

Input Arguments

audioIn — Audio input column vector

fs — Sample rate (Hz) positive scalar

Output Arguments

audioOut — Audio output column vector

Algorithms

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Download `enhanceSpeech` Functionality

`audioIn` — Audio input
column vector

`fs` — Sample rate (Hz)
positive scalar

`audioOut` — Audio output
column vector

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.