identify

Identify label

collapse all in page

Syntax

tableOut = identify(ivs,data)

tableOut = identify(ivs,data,scorer)

tableOut = identify(___,NumCandidates=N)

Description

tableOut = identify(ivs,data) identifies the label corresponding to the data.

example

tableOut = identify(ivs,data,scorer) specifies the scorer used to perform identification.

example

tableOut = identify(___,NumCandidates=N) specifies the number of candidates to return in tableOut.

Examples

collapse all

Train Speaker Identification System

Open Live Script

This example uses a 1.36 GB subset of the Common Voice data set from Mozilla [1]. The data set contains 48 kHz recordings of subjects speaking short sentences.

Download the data set if it doesn't already exist and unzip it into tempdir.

downloadFolder = matlab.internal.examples.downloadSupportFile("audio","commonvoice.zip");
dataFolder = tempdir;
unzip(downloadFolder,dataFolder);

Ingest the train set using audioDatastore and, to speed up this example, keep only 20% of each of the speaker files.

trainTable = readtable(dataFolder + fullfile("commonvoice","train","train.tsv"),FileType="text",Delimiter="tab");
adsTrain = audioDatastore(append(fullfile(dataFolder,"commonvoice","train","clips",filesep),trainTable.path,".wav"));
idx = splitlabels(trainTable.client_id,0.2);
adsTrain = subset(adsTrain,idx{1});
trainLabels = trainTable.client_id(idx{1});

Ingest the validation set using audioDatastore.

valTable = readtable(dataFolder + fullfile("commonvoice","validation","validation.tsv"),FileType="text",Delimiter="tab");
valLabels = valTable.client_id;
adsVal = audioDatastore(append(fullfile(dataFolder,"commonvoice","validation","clips",filesep),valTable.path,".wav"));

Split the validation data set into enroll and test sets. Use two utterances for enrollment and the remaining for the test set. Also, exclude any speakers with less than 5 utterances. Generally, the more utterances you use for enrollment, the better the performance of the system. However, most practical applications are limited to a small set of enrollment utterances.

labelCounts = countlabels(valLabels);

labelsToExclude = labelCounts.Label(labelCounts.Count<5);
idxs = splitlabels(valLabels,2,Exclude=labelsToExclude);

adsEnroll = subset(adsVal,idxs{1});
enrollLabels = valLabels(idxs{1});

adsTest = subset(adsVal,idxs{2});
testLabels = valLabels(idxs{2});

Create an i-vector system that accepts feature input.

fs = 48e3;
iv = ivectorSystem(SampleRate=fs,InputType="features");

Create an audioFeatureExtractor object to extract the gammatone cepstral coefficients (GTCC), the delta GTCC, the delta-delta GTCC, and the pitch from 50 ms periodic Hann windows with 45 ms overlap.

afe = audioFeatureExtractor(...
    SampleRate=fs, ...
    Window=hann(round(0.05*fs),"periodic"), ...
    OverlapLength=round(0.045*fs), ...
    gtcc=true,gtccDelta=true,gtccDeltaDelta=true,pitch=true);

Extract features from the train and enroll datastores.

xTrain = extract(afe,adsTrain);
xEnroll = extract(afe,adsEnroll);

Train both the extractor and classifier using the training set.

trainExtractor(iv,xTrain, ...
    UBMNumComponents=64, ...
    UBMNumIterations=5, ...
    TVSRank=32, ...
    TVSNumIterations=3);

Calculating standardization factors .....done.
Training universal background model ........done.
Training total variability space ......done.
i-vector extractor training complete.

trainClassifier(iv,xTrain,trainLabels, ...
    NumEigenvectors=16, ...
    ...
    PLDANumDimensions=16, ...
    PLDANumIterations=5);

Extracting i-vectors ...done.
Training projection matrix .....done.
Training PLDA model ........done.
i-vector classifier training complete.

To calibrate the system so that scores can be interpreted as a measure of confidence in a positive decision, use calibrate.

calibrate(iv,xTrain,trainLabels)

Extracting i-vectors ...done.
Calibrating CSS scorer ...done.
Calibrating PLDA scorer ...done.
Calibration complete.

Enroll the speakers from the enrollment set.

enroll(iv,xEnroll,enrollLabels)

Extracting i-vectors ...done.
Enrolling i-vectors ...................done.
Enrollment complete.

Evaluate the file-level prediction accuracy on the test set.

numCorrect = 0;
reset(adsTest)
for index = 1:numel(adsTest.Files)
    features = extract(afe,read(adsTest));
    
    results = identify(iv,features);
    
    trueLabel = testLabels(index);
    predictedLabel = results.Label(1);
    isPredictionCorrect = trueLabel==predictedLabel;
    
    numCorrect = numCorrect + isPredictionCorrect;
end
display("File Accuracy: " + round(100*numCorrect/numel(adsTest.Files),2) + " (%)")

    "File Accuracy: 97.92 (%)"

References

[1] Mozilla Common Voice

Input Arguments

collapse all

`ivs` — i-vector system
`ivectorSystem` object

i-vector system, specified as an object of type ivectorSystem.

`data` — Data to identify
column vector | matrix

Data to identify, specified as a column vector representing a single-channel (mono) audio signal or a matrix of audio features.

If InputType is set to "audio" when the i-vector system is created, data must be a column vector with underlying type single or double.
If InputType is set to "features" when the i-vector system is created, data must be a matrix with underlying type single or double. The matrix must consist of audio features where the number of features (columns) is locked the first time trainExtractor is called and the number of hops (rows) is variable-sized.

Data Types: single | double

`scorer` — Scoring algorithm
`"plda"` | `"css"`

Scoring algorithm used by the i-vector system, specified as "plda", which corresponds to probabilistic linear discriminant analysis (PLDA), or "css", which corresponds to cosine similarity score (CSS).

To use "plda", you must train the PLDA model using trainClassifier. If the PLDA model has been trained, then scorer defaults to "plda". Otherwise, the scorer defaults to "css".

Data Types: char | string

`N` — Number of candidates
positive scalar

Number of candidates to return in tableOut, specified as a positive scalar.

Note

If you request a number of candidates greater than the number of labels enrolled in the i-vector system, then all candidates are returned. If unspecified, the number of candidates defaults to the number of enrolled labels.

Data Types: single | double

Output Arguments

collapse all

`tableOut` — Score table
table

Candidate labels and corresponding scores, returned as a table. The number of rows of tableOut is equal to N, the number of candidates. The candidates are sorted in order of confidence.

Data Types: table

Version History

Introduced in R2021a

expand all

R2022a: `identify` throws warning if scores are not calibrated

Starting in R2022a, the identify function throws a warning if the scores from the i-vector system are not calibrated. Use calibrate to calibrate the scores.

identify

Syntax

Description

Examples

Train Speaker Identification System

Input Arguments

ivs — i-vector system ivectorSystem object

data — Data to identify column vector | matrix

scorer — Scoring algorithm "plda" | "css"

N — Number of candidates positive scalar

Output Arguments

tableOut — Score table table

Version History

R2022a: identify throws warning if scores are not calibrated

See Also

`ivs` — i-vector system
`ivectorSystem` object

`data` — Data to identify
column vector | matrix

`scorer` — Scoring algorithm
`"plda"` | `"css"`

`N` — Number of candidates
positive scalar

`tableOut` — Score table
table

R2022a: `identify` throws warning if scores are not calibrated