Main Content

extract

Extract audio features

Since R2019b

Description

example

features = extract(aFE,audioIn) returns an array containing features of the audio input.

features = extract(aFE,ds) extracts features from all of the audio files in the audioDatastore object ds.

features = extract(aFE,ds,Name=Value) specifies options using one or more name-value arguments. For example, extract(aFE,ds,UseParallel=true) reads the data and extracts features in parallel.

Examples

collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor to extract the centroid of the Bark spectrum, the kurtosis of the Bark spectrum, and the pitch of an audio signal.

aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","barkSpectrum", ...
    "spectralCentroid",true, ...
    "spectralKurtosis",true, ...
    "pitch",true)
aFE = 
  audioFeatureExtractor with properties:

   Properties
                     Window: [1024x1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'
        FeatureVectorLength: 3

   Enabled Features
     spectralCentroid, spectralKurtosis, pitch

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     mfccDeltaDelta, gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease
     spectralEntropy, spectralFlatness, spectralFlux, spectralRolloffPoint, spectralSkewness, spectralSlope
     spectralSpread, harmonicRatio, zerocrossrate, shortTimeEnergy


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

Call extract to extract the features from the audio signal. Normalize the features by their mean and standard deviation.

features = extract(aFE,audioIn);
features = (features - mean(features,1))./std(features,[],1);

Plot the normalized features over time.

idx = info(aFE);
duration = size(audioIn,1)/fs;

subplot(2,1,1)
t = linspace(0,duration,size(audioIn,1));
plot(t,audioIn)

subplot(2,1,2)
t = linspace(0,duration,size(features,1));
plot(t,features(:,idx.spectralCentroid), ...
     t,features(:,idx.spectralKurtosis), ...
     t,features(:,idx.pitch));
legend("Spectral Centroid","Spectral Kurtosis", "Pitch")
xlabel("Time (s)")

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 with xlabel Time (s) contains 3 objects of type line. These objects represent Spectral Centroid, Spectral Kurtosis, Pitch.

Create an audio datastore that points to audio samples included with Audio Toolbox®.

folder = fullfile(matlabroot,"toolbox","audio","samples");
ads = audioDatastore(folder);

Create an audioFeatureExtractor object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

aFE = audioFeatureExtractor(SampleRate=44.1e3, ...
    melSpectrum=true, ...
    barkSpectrum=true, ...
    erbSpectrum=true, ...
    linearSpectrum=true);

Call extract to extract the features from each audio file in the datastore. Specify SampleRateMismatchRule as "resample" to resample the audio files in the datastore if they do not match 44.1 kHz, the sample rate of the audioFeatureExtractor object. If you have Parallel Computing Toolbox™, specify UseParallel as true to read the files and extract the features in parallel.

specs = extract(aFE,ads,SampleRateMismatchRule="resample",UseParallel=true);

The specs variable is a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numFiles = numel(specs)
numFiles = 37
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})
numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})
numHops2 = 1724
numFeaturesFile2 = 620
numChanelsFile2 = 4

Input Arguments

collapse all

Input audio, specified as a column vector or matrix of independent channels (columns).

Data Types: single | double

Audio datastore to extract features from, specified as an audioDatastore object.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: extract(aFE,ds,SampleRateMismatchRule="resample")

Read data and extract features from the audioDatastore in parallel. If you specify true, extract reads the data and extracts features using a pool of parallel workers. For more information on parallel pools, see parpool (Parallel Computing Toolbox).

This functionality requires Parallel Computing Toolbox™.

Data Types: logical

Behavior of the extract function when the sample rate of an audio file in the audioDatastore does not match the sample rate set on the audioFeatureExtractor object, specified as "error", "warn", or "resample".

  • "error" — Error immediately if there is a sample rate mismatch.

  • "warn" — Use the sample rate of the audioFeatureExtractor object and display a warning if the sample rate of any file does not match.

  • "resample" — If there is a mismatch, resample the audio data to match the sample rate of the audioFeatureExtractor object.

Data Types: char | string

Output Arguments

collapse all

Extracted audio features, returned as an L-by-M-by-N array, where:

  • L –– Number of feature vectors (hops)

  • M –– Number of features extracted per analysis window

  • N –– Number of channels

If the input is an audioDatastore object, extract returns a cell array where each cell corresponds to an audio file and contains the extracted features from that file.

Data Types: single | double

Version History

Introduced in R2019b

expand all