detectSpeech
Detect boundaries of speech in audio signal
Syntax
Description
idx = detectSpeech(audioIn,fs,Name,Value)Name,Value pair arguments.
Example: detectSpeech(audioIn,fs,'Window',hann(512,'periodic'),'OverlapLength',256)
        detects speech using a 512-point periodic Hann window with 256-point overlap.
[
        also returns the thresholds used to compute the boundaries of speech.idx,thresholds] = detectSpeech(___)
detectSpeech(___) with no output arguments displays a
        plot of the detected speech regions in the input signal.
Examples
Input Arguments
Name-Value Arguments
Output Arguments
Algorithms
The detectSpeech algorithm is based on [1], although modified so that
      the statistics to threshold are short-term energy and spectral spread, instead of short-term
      energy and spectral centroid. The diagram and steps provide a high-level overview of the
      algorithm. For details, see [1].

- The audio signal is converted to a time-frequency representation using the specified - Windowand- OverlapLength.
- The short-term energy and spectral spread is calculated for each frame. The spectral spread is calculated according to - spectralSpread.
- Histograms are created for both the short-term energy and spectral spread distributions. 
- For each histogram, a threshold is determined according to , where M1 and M2 are the first and second local maxima, respectively. W is set to - 5.
- Both the spectral spread and the short-term energy are smoothed across time by passing through successive five-element moving median filters. 
- Masks are created by comparing the short-term energy and spectral spread with their respective thresholds. To declare a frame as containing speech, a feature must be above its threshold. 
- The masks are combined. For a frame to be declared as speech, both the short-term energy and the spectral spread must be above their respective thresholds. 
- Regions declared as speech are merged if the distance between them is less than - MergeDistance.
References
[1] Giannakopoulos, Theodoros. "A Method for Silence Removal and Segmentation of Speech Signals, Implemented in MATLAB", (University of Athens, Athens, 2009).
Extended Capabilities
Version History
Introduced in R2020a












