# Pitch Tracking Using Multiple Pitch Estimations and HMM

This example shows how to perform pitch tracking using multiple pitch estimations, octave and median smoothing, and a hidden Markov model (HMM).

### Introduction

Pitch detection is a fundamental building block in speech processing, speech coding, and music information retrieval (MIR). In speech and speaker recognition, pitch is used as a feature in a machine learning system. For call centers, pitch is used to indicate the emotional state and gender of customers. In speech therapy, pitch is used to indicate and analyze pathologies and diagnose physical defects. In MIR, pitch is used to categorize music, for query-by-humming systems, and as a primary feature in song identification systems.

Pitch detection for clean speech is mostly considered a solved problem. Pitch detection with noise and multi-pitch tracking remain difficult problems. There are many algorithms that have been extensively reported on in the literature with known trade-offs between computational cost and robustness to different types of noise.

Usually, a pitch detection algorithm (PDA) estimates the pitch for a given time instant. The pitch estimate is then validated or corrected within a pitch tracking system. Pitch tracking systems enforce continuity of pitch estimates over time.

This example provides an example function, HelperPitchTracker, which implements a pitch tracking system. The example walks through the algorithm implemented by the HelperPitchTracker function.

### Problem Summary

Load an audio file and corresponding reference pitch for the audio file. The reference pitch is reported every 10 ms and was determined as an average of several third-party algorithms on the clean speech file. Regions without voiced speech are represented as nan.

Use the pitch function to estimate the pitch of the audio over time.

[f0,locs] = pitch(x,fs);

Two metrics are commonly reported when defining pitch error: gross pitch error (GPE) and voicing decision error (VDE). Because the pitch algorithms in this example do not provide a voicing decision, only GPE is reported. In this example, GPE is calculated as the percent of pitch estimates outside $±10%$ of the reference pitch over the span of the voiced segments.

Calculate the GPE for regions of speech and plot the results. Listen to the clean audio signal.

isVoiced = ~isnan(truePitch);
f0(~isVoiced) = nan;

p = 0.1;
GPE = mean(abs(f0(isVoiced)-truePitch(isVoiced)) > truePitch(isVoiced).*p).*100;

t = (0:length(x)-1)/fs;
t0 = (locs-1)/fs;
sound(x,fs)

figure(1)
tiledlayout(2,1)
nexttile
plot(t,x)
ylabel('Amplitude')
title('Pitch Estimation of Clean Signal')

nexttile
plot(t0,[truePitch,f0])
legend('Reference','Estimate','Location','NW')
ylabel('F0 (Hz)')
xlabel('Time (s)')
title(sprintf('GPE = %0.1f%%',GPE))

Mix the speech signal with noise at $-5$ dB SNR.

Use the pitch function on the noisy audio to estimate the pitch over time. Calculate the GPE for regions of voiced speech and plot the results. Listen to the noisy audio signal.

desiredSNR = -5;
x = mixSNR(x,rand(size(x)),desiredSNR);

[f0,locs] = pitch(x,fs);
f0(~isVoiced) = nan;
GPE = mean(abs(f0(isVoiced) - truePitch(isVoiced)) > truePitch(isVoiced).*p).*100;

sound(x,fs)

figure(2)
tiledlayout(2,1)

nexttile
plot(t,x)
ylabel('Amplitude')
title('Pitch Estimation of Noisy Signal')

nexttile
plot(t0,[truePitch,f0])
legend('Reference','Estimate','Location','NW')
ylabel('F0 (Hz)')
xlabel('Time (s)')
title(sprintf('GPE = %0.1f%%',GPE))

This example shows how to improve the pitch estimation of noisy speech signals using multiple pitch candidate generation, octave-smoothing, median-smoothing, and an HMM.

The algorithm described in this example is implemented in the example function HelperPitchTracker. To learn about the HelperPitchTracker function, enter help HelperPitchTracker at the command line.

help HelperPitchTracker
HelperPitchTracker Track the fundamental frequency of audio signal
f0 = HelperPitchTracker(audioIn,fs) returns an estimate of the
fundamental frequency contour for the audio input. Columns of the
input are treated as individual channels. The HelperPitchTracker
function uses multiple pitch detection algorithms to generate pitch
candidates, and uses octave smoothing and a Hidden Markov Model to
return an estimate of the fundamental frequency.

f0 = HelperPitchTracker(...,'HopLength',HOPLENGTH) specifies the number
of samples in each hop. The pitch estimate is updated every hop.
Specify HOPLENGTH as a scalar integer. If unspecified, HOPLENGTH
defaults to round(0.01*fs).

f0 = HelperPitchTracker(...,'OctaveSmoothing',TF) specifies whether or
not to apply octave smoothing. Specify as true or false. If
unspecified, TF defaults to true.

f0 = HelperPitchTracker(...,'EmissionMatrix',EMISSIONMATRIX) specifies
the emission matrix used for the HMM during the forward pass. The
default emission matrix was trained on the Pitch Tracking Database from
Graz University of Technology. The database consists of 4720 speech
segments with corresponding pitch trajectories derived from
laryngograph signals. The emission matrix corresponds to the
probability that a speaker leaves one pitch state to another, in the
range [50, 400] Hz. Specify the emission matrix such that rows
correspond to the current state, columns correspond to the possible
future state, and the values of the matrix correspond to the
probability of moving from the current state to the future state. If
you specify your own emission matrix, specify its corresponding
EMISSIONMATRIXRANGE. EMISSIONMATRIX must be a real N-by-N matrix of
integers.

f0 = HelperPitchTracker(...,'EmissionMatrixRange',EMISSIONMATRIXRANGE)
specifies how the EMISSIONMATRIX corresponds to Hz. If unspecified,
EMISSIONMATRIXRANGE defaults to 50:400.

[f0,loc] = HelperPitchTracker(...) returns the locations associated
with each pitch decision. The locations correspond to the ceiling of
the center of the analysis frames.

[f0,loc,hr] = HelperPitchTracker(...) returns the harmonic ratio
associated with each pitch decision.

### Description of Pitch Tracking System

The graphic provides an overview of the pitch tracking system implemented in the example function. The following code walks through the internal workings of the HelperPitchTracker example function.

### 1. Generate Multiple Pitch Candidates

In the first stage of the pitch tracking system, you generate multiple pitch candidates using multiple pitch detection algorithms. The primary pitch candidates, which are generally more accurate, are generated using algorithms based on the Summation of Residual Harmonics (SRH) [2] algorithm and the Pitch Estimation Filter with Amplitude Compression (PEFAC) [3] algorithm.

Buffer the noisy input signal into overlapped frames, and then use audio.internal.pitch.SRH to generate 5 pitch candidates for each hop. Also return the relative confidence of each pitch candidate. Plot the results.

RANGE = [50,400];
HOPLENGTH = round(fs.*0.01);

% Buffer into required sizes
xBuff_SRH = buffer(x,round(0.025*fs),round(0.02*fs),'nodelay');

% Define pitch parameters
params_SRH = struct('Method','SRH', ...
'Range',RANGE, ...
'WindowLength',round(fs*0.06), ...
'OverlapLength',round(fs*0.06-HOPLENGTH), ...
'SampleRate',fs, ...
'NumChannels',size(x,2), ...
'SamplesPerChannel',size(x,1));
multiCandidate_params_SRH = struct('NumCandidates',5,'MinPeakDistance',1);

% Get pitch estimate and confidence
[f0_SRH,conf_SRH] = audio.internal.pitch.SRH(xBuff_SRH,x, ...
params_SRH, ...
multiCandidate_params_SRH);
figure(3)
tiledlayout(2,1)
nexttile
plot(t0,f0_SRH)
ylabel('F0 Candidates (Hz)')
title('Multiple Candidates from SRH Pitch Estimation')
nexttile
plot(t0,conf_SRH)
ylabel('Relative Confidence')
xlabel('Time (s)')

Generate an additional set of primary pitch candidates and associated confidence using the PEF algorithm. Generate backup candidates and associated confidences using the normalized correlation function (NCF) algorithm and cepstrum pitch determination (CEP) algorithm. Log only the most confident estimate from the backup candidates.

xBuff_PEF = buffer(x,round(0.06*fs),round(0.05*fs),'nodelay');
params_PEF = struct('Method','PEF', ...
'Range',RANGE, ...
'WindowLength',round(fs*0.06), ...
'OverlapLength',round(fs*0.06-HOPLENGTH), ...
'SampleRate',fs, ...
'NumChannels',size(x,2), ...
'SamplesPerChannel',size(x,1));
multiCandidate_params_PEF = struct('NumCandidates',5,'MinPeakDistance',5);
[f0_PEF,conf_PEF] = audio.internal.pitch.PEF(xBuff_PEF, ...
params_PEF, ...
multiCandidate_params_PEF);

xBuff_NCF = buffer(x,round(0.04*fs),round(0.03*fs),'nodelay');
xBuff_NCF = xBuff_NCF(:,2:end-1);
params_NCF = struct('Method','NCF', ...
'Range',RANGE, ...
'WindowLength',round(fs*0.04), ...
'OverlapLength',round(fs*0.04-HOPLENGTH), ...
'SampleRate',fs, ...
'NumChannels',size(x,2), ...
'SamplesPerChannel',size(x,1));
multiCandidate_params_NCF = struct('NumCandidates',5,'MinPeakDistance',1);
f0_NCF = audio.internal.pitch.NCF(xBuff_NCF, ...
params_NCF, ...
multiCandidate_params_NCF);

xBuff_CEP = buffer(x,round(0.04*fs),round(0.03*fs),'nodelay');
xBuff_CEP = xBuff_CEP(:,2:end-1);
params_CEP = struct('Method','CEP', ...
'Range',RANGE, ...
'WindowLength',round(fs*0.04), ...
'OverlapLength',round(fs*0.04-HOPLENGTH), ...
'SampleRate',fs, ...
'NumChannels',size(x,2), ...
'SamplesPerChannel',size(x,1));
multiCandidate_params_CEP = struct('NumCandidates',5,'MinPeakDistance',1);
f0_CEP = audio.internal.pitch.CEP(xBuff_CEP, ...
params_CEP, ...
multiCandidate_params_CEP);

BackupCandidates = [f0_NCF(:,1),f0_CEP(:,1)];

### 2. Determine Long-Term Median

The long-term median of the pitch candidates is used to reduce the number of pitch candidates. To calculate the long-term median pitch, first calculate the harmonic ratio. Pitch estimates are only valid in regions of voiced speech, where the harmonic ratio is high.

hr = harmonicRatio(xBuff_PEF,fs, ...
'Window',hamming(size(xBuff_NCF,1),'periodic'), ...
'OverlapLength',0);

figure(4)
tiledlayout(2,1)
nexttile
plot(t,x)
ylabel('Amplitude')
nexttile
plot(t0,hr)
ylabel('Harmonic Ratio')
xlabel('Time (s)')

Use the harmonic ratio to threshold out regions that do not include voiced speech in the long-term median calculation. After determining the long-term median, calculate lower and upper bounds for pitch candidates. In this example, the lower and upper bounds were determined empirically as 2/3 and 4/3 the median pitch. Candidates outside of these bounds are penalized in the following stage.

idxToKeep = logical(movmedian(hr>((3/4)*max(hr)),3));
longTermMedian = median([f0_PEF(idxToKeep,1);f0_SRH(idxToKeep,1)]);
lower = max((2/3)*longTermMedian,RANGE(1));
upper = min((4/3)*longTermMedian,RANGE(2));

figure(5)
tiledlayout(1,1)
nexttile
plot(t0,[f0_PEF,f0_SRH])
hold on
plot(t0,longTermMedian.*ones(size(f0_PEF,1)),'r:','LineWidth',3)
plot(t0,upper.*ones(size(f0_PEF,1)),'r','LineWidth',2)
plot(t0,lower.*ones(size(f0_PEF,1)),'r','LineWidth',2)
hold off
xlabel('Time (s)')
ylabel('Frequency (Hz)')
title('Long Term Median')

### 3. Candidate Reduction

By default, candidates returned by the pitch detection algorithm are sorted in descending order of confidence. Decrease the confidence of any primary candidate outside the lower and upper bounds. Decrease the confidence by a factor of 10. Re-sort the candidates for both the PEF and SRH algorithms so they are in descending order of confidence. Concatenate the candidates, keeping only the two most confident candidates from each algorithm.

Plot the reduced candidates.

invalid = f0_PEF>lower | f0_PEF>upper;
conf_PEF(invalid) = conf_PEF(invalid)/10;
[conf_PEF,order] = sort(conf_PEF,2,'descend');
for i = 1:size(f0_PEF,1)
f0_PEF(i,:) = f0_PEF(i,order(i,:));
end

invalid = f0_SRH>lower | f0_SRH>upper;
conf_SRH(invalid) = conf_SRH(invalid)/10;
[conf_SRH,order] = sort(conf_SRH,2,'descend');
for i = 1:size(f0_SRH,1)
f0_SRH(i,:) = f0_SRH(i,order(i,:));
end

candidates = [f0_PEF(:,1:2),f0_SRH(:,1:2)];
confidence = [conf_PEF(:,1:2),conf_SRH(:,1:2)];

figure(6)
plot(t0,candidates)
xlabel('Time (s)')
ylabel('Frequency (Hz)')
title('Reduced Candidates')

### 4. Make Distinctive

If two or more candidates are within a given 5 Hz span, set the candidates to their mean and sum their confidence.

span = 5;
confidenceFactor = 1;
for r = 1:size(candidates,1)
for c = 1:size(candidates,2)
tf = abs(candidates(r,c)-candidates(r,:)) < span;
candidates(r,c) = mean(candidates(r,tf));
confidence(r,c) = sum(confidence(r,tf))*confidenceFactor;
end
end
candidates = max(min(candidates,400),50);

### 5. Forward Iteration of HMM with Octave Smoothing

Now that the candidates have been reduced, you can feed them into an HMM to enforce continuity constraints. Pitch contours are generally continuous for speech signals when analyzed in 10 ms hops. The probability of a pitch moving from one state to another across time is referred to as the emission probability. Emission probabilities can be encapsulated into a matrix which describes the probability of going from any pitch value in a set range to any other in a set range. The emission matrix used in this example was created using the Graz database. [1]

Load the emission matrix and associated range. Plot the probability density function (PDF) of a pitch in 150 Hz state.

currentState = 150;

figure(7)
plot(emissionMatrixRange(1):emissionMatrixRange(2),emissionMatrix(currentState - emissionMatrixRange(1)-1,:))
title(sprintf('Emission PDF for %d Hz',currentState))
xlabel('Next Pitch Value (Hz)')
ylabel('Probability')

The HMM used in this example combines the emission probabilities, which enforce continuity, and the relative confidence of the pitch. At each hop, the emission probabilities are combined with the relative confidence to create a confidence matrix. A best choice for each path is determined as the max of the confidence matrix. The HMM used in this example also assumes that only one path can be assigned to a given state (an assumption of the Viterbi algorithm).

In addition to the HMM, this example monitors for octave jumps relative to the short-term median of the pitch paths. If an octave jump is detected, then the backup pitch candidates are added as options for the HMM.

% Preallocation
numPaths = 4;
numHops = size(candidates,1);
logbook = zeros(numHops,3,numPaths);
suspectHops = zeros(numHops,1);

% Forward iteration with octave-smoothing
for hopNumber = 1:numHops
nowCandidates = candidates(hopNumber,:);
nowConfidence = confidence(hopNumber,:);

% Remove octave jumps
if hopNumber > 100
numCandidates = numel(nowCandidates);

% Weighted short term median
medianWindowLength = 50;
aTemp = logbook(max(hopNumber-min(hopNumber,medianWindowLength),1):hopNumber-1,1,:);
shortTermMedian = median(aTemp(:));
previousM = mean([longTermMedian,shortTermMedian]);
lowerTight = max((4/3)*previousM,emissionMatrixRange(1));
upperTight = min((2/3)*previousM,emissionMatrixRange(2));
numCandidateOutside = sum([nowCandidates < lowerTight, nowCandidates > upperTight]);

% If at least 1 candidate is outside the octave centered on the
% short-term median, add the backup pitch candidates that were
% generated by the normalized correlation function and cepstrum
% pitch determination algorithms as potential candidates.
if numCandidateOutside > 0
% Apply the backup pitch estimators
nowCandidates = [nowCandidates,BackupCandidates(hopNumber,:)];%#ok<AGROW>
nowConfidence = [nowConfidence,repmat(mean(nowConfidence),1,2)];%#ok<AGROW>

% Make distinctive
span = 10;
confidenceFactor = 1.2;
for r = 1:size(nowCandidates,1)
for c = 1:size(nowCandidates,2)
tf = abs(nowCandidates(r,c)-nowCandidates(r,:)) < span;
nowCandidates(r,c) = mean(nowCandidates(r,tf));
nowConfidence(r,c) = sum(nowConfidence(r,tf))*confidenceFactor;
end
end
end
end

% Create confidence matrix
confidenceMatrix = zeros(numel(nowCandidates),size(logbook,3));
for pageIdx = 1:size(logbook,3)
if hopNumber ~= 1
pastPitch = floor(logbook(hopNumber-1,1,pageIdx)) - emissionMatrixRange(1) + 1;
else
pastPitch = nan;
end
for candidateNumber = 1:numel(nowCandidates)
if hopNumber ~= 1
% Get the current pitch and convert to an index in the range
currentPitch = floor(nowCandidates(candidateNumber)) - emissionMatrixRange(1) + 1;
confidenceMatrix(candidateNumber,pageIdx) = ...
emissionMatrix(currentPitch,pastPitch)*logbook(hopNumber-1,2,pageIdx)*nowConfidence(candidateNumber);
else
confidenceMatrix(candidateNumber,pageIdx) = 1;
end
end
end

% Assign an estimate for each path
for pageIdx = 1:size(logbook,3)
% Determine most confident transition from past to current pitch
[~,idx] = max(confidenceMatrix(:));

% Convert confidence matrix index to pitch and logbook page
[chosenPitch,pastPitchIdx] = ind2sub([numel(nowCandidates),size(logbook,3)],idx);

logbook(hopNumber,:,pageIdx) = ...
[nowCandidates(chosenPitch), ...
confidenceMatrix(chosenPitch,pastPitchIdx), ...
pastPitchIdx];

% Remove the chosen current pitch from the confidence matrix
confidenceMatrix(chosenPitch,:) = NaN;
end
% Normalize confidence
logbook(hopNumber,2,:) = logbook(hopNumber,2,:)/sum(logbook(hopNumber,2,:));
end

### 6. Traceback of HMM

Once the forward iteration of the HMM is complete, the final pitch contour is chosen as the most confident path. Walk backward through the log book to determine the pitch contour output by the HMM. Calculate the GPE and plot the new pitch contour and the known contour.

numHops = size(logbook,1);
keepLooking = true;
index = numHops + 1;

while keepLooking
index = index - 1;
if abs(max(logbook(index,2,:))-min(logbook(index,2,:)))~=0
keepLooking = false;
end
end

[~,bestPathIdx] = max(logbook(index,2,:));
bestIndices = zeros(numHops,1);
bestIndices(index) = bestPathIdx;

for ii = index:-1:2
bestIndices(ii-1) = logbook(ii,3,bestIndices(ii));
end

bestIndices(bestIndices==0) = 1;
f0 = zeros(numHops,1);
for ii = (numHops):-1:2
f0(ii) = logbook(ii,1,bestIndices(ii));
end

f0toPlot = f0;
f0toPlot(~isVoiced) = NaN;
GPE = mean( abs(f0toPlot(isVoiced) - truePitch(isVoiced)) > truePitch(isVoiced).*p).*100;
figure(8)
plot(t0,[truePitch,f0toPlot])
legend('Reference','Estimate')
ylabel('F0 (Hz)')
xlabel('Time (s)')
title(sprintf('GPE = %0.1f%%',GPE))

### 7. Moving Median Filter

As a final post-processing step, apply a moving median filter with a window length of three hops. Calculate the final GPE and plot the final pitch contour and the known contour.

f0 = movmedian(f0,3);
f0(~isVoiced) = NaN;

GPE = mean(abs(f0(isVoiced) - truePitch(isVoiced)) > truePitch(isVoiced).*p).*100;
figure(9)
plot(t0,[truePitch,f0])
legend('Reference','Estimate')
ylabel('F0 (Hz)')
xlabel('Time (s)')
title(sprintf('GPE = %0.1f%%',GPE))

### Performance Evaluation

The HelperPitchTracker function uses an HMM to apply continuity constraints to pitch contours. The emission matrix of the HMM can be set directly. It is best to train the emission matrix on sound sources similar to the ones you want to track.

This example uses the Pitch Tracking Database from Graz University of Technology (PTDB-TUG) [4]. The data set consists of 20 English native speakers reading 2342 phonetically rich sentences from the TIMIT corpus. Download and extract the data set. Depending on your system, downloading and extracting the data set can take approximately 1.5 hours.

url = 'https://www2.spsc.tugraz.at/databases/PTDB-TUG/SPEECH_DATA_ZIPPED.zip';

if ~exist(datasetFolder,'dir')
unzip(url,datasetFolder)
end

Create an audio datastore that points to the microphone recordings in the database. Set the label associated with each file to the location of the associated known pitch file. The dataset contains recordings of 10 female and 10 male speakers. Use subset to isolate the 10th female and male speakers. Train an emission matrix based on the reference pitch contours for both male and female speakers 1 through 9.

ads = audioDatastore([fullfile(datasetFolder,"SPEECH DATA","FEMALE","MIC"),fullfile(datasetFolder,"SPEECH DATA","MALE","MIC")], ...
'IncludeSubfolders',true, ...
'FileExtensions','.wav');

Shuffle the audio datastores.

The emission matrix describes the probability of going from one pitch state to another. In the following step, you create an emission matrix based on speakers 1 through 9 for both male and female. The database stores reference pitch values, short-term energy, and additional information in the text files with files extension f0. The getReferencePitch function reads in the pitch values if the short-term energy is above a threshold. The threshold was determined empirically in listening tests. The HelperUpdateEmissionMatrix creates a 2-dimensional histogram based on the current pitch state and the next pitch state. After the histogram is created, it is normalized to create an emission matrix.

emissionMatrixRange = [50,400];
emissionMatrix = [];

emissionMatrix = HelperUpdateEmissionMatrix(x,emissionMatrixRange,emissionMatrix);
end
emissionMatrix = emissionMatrix + sqrt(eps);
emissionMatrix = emissionMatrix./norm(emissionMatrix);

Define different types of background noise: white, ambiance, engine, jet, and street. Resample them to 16 kHz to help speed up testing the database.

Define the SNR to test, in dB, as 10, 5, 0, -5, and -10.

noiseType = {'white','ambiance','engine','jet','street'};
numNoiseToTest = numel(noiseType);

desiredFs = 16e3;

whiteNoiseMaker = dsp.ColoredNoise('Color','white','SamplesPerFrame',40000,'RandomStream','mt19937ar with seed','BoundedOutput',true);
noise{1} = whiteNoiseMaker();
noise{2} = resample(ambiance,desiredFs,ambianceFs);
noise{3} = resample(engine,desiredFs,engineFs);
noise{4} = resample(jet,desiredFs,jetFs);
noise{5} = resample(street,desiredFs,streetFs);

snrToTest = [10,5,0,-5,-10];
numSNRtoTest = numel(snrToTest);

Run the pitch detection algorithm for each SNR and noise type for each file. Calculate the average GPE across speech files. This example compares performance with the popular pitch tracking algorithm: Sawtooth Waveform Inspired Pitch Estimator (SWIPE). A MATLAB® implementation of the algorithm can be found at [5]. To run this example without comparing to other algorithms, set compare to false. The following comparison takes around 15 minutes.

compare = true;
numFilesToTest = 20;
p = 0.1;
GPE_pitchTracker = zeros(numSNRtoTest,numNoiseToTest,numFilesToTest);
if compare
GPE_swipe = GPE_pitchTracker;
end
for i = 1:numFilesToTest
cleanSpeech = resample(cleanSpeech,desiredFs,info.SampleRate);

truePitch = getReferencePitch(info.Label{:});
isVoiced = truePitch~=0;
truePitchInVoicedRegions = truePitch(isVoiced);

for j = 1:numSNRtoTest
for k = 1:numNoiseToTest
noisySpeech = mixSNR(cleanSpeech,noise{k},snrToTest(j));
f0 = HelperPitchTracker(noisySpeech,desiredFs,'EmissionMatrix',emissionMatrix,'EmissionMatrixRange',emissionMatrixRange);
f0 = [0;f0]; % manual alignment with database.
GPE_pitchTracker(j,k,i) = mean(abs(f0(isVoiced) - truePitchInVoicedRegions) > truePitchInVoicedRegions.*p).*100;

if compare
f0 = swipep(noisySpeech,desiredFs,[50,400],0.01);
f0 = f0(3:end); % manual alignment with database.
GPE_swipe(j,k,i) = mean(abs(f0(isVoiced) - truePitchInVoicedRegions) > truePitchInVoicedRegions.*p).*100;
end
end
end
end
GPE_pitchTracker = mean(GPE_pitchTracker,3);

if compare
GPE_swipe = mean(GPE_swipe,3);
end

Plot the gross pitch error for each noise type.

for ii = 1:numel(noise)
figure(9+ii)
plot(snrToTest,GPE_pitchTracker(:,ii),'b')
hold on
if compare
plot(snrToTest,GPE_swipe(:,ii),'g')
end
plot(snrToTest,GPE_pitchTracker(:,ii),'bo')
if compare
plot(snrToTest,GPE_swipe(:,ii),'gv')
end
title(noiseType(ii))
xlabel('SNR (dB)')
ylabel(sprintf('Gross Pitch Error (p = %0.2f)',p))
if compare
legend('HelperPitchTracker','SWIPE')
else
legend('HelperPitchTracker')
end
grid on
hold off
end

### Conclusion

You can use HelperPitchTracker as a baseline for evaluating GPE performance of your pitch tracking system, or adapt this example to your application.

### References

[1] G. Pirker, M. Wohlmayr, S. Petrik, and F. Pernkopf, "A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario", Interspeech, pp. 1509-1512, 2011.

[2] Drugman, Thomas, and Abeer Alwan. "Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics." Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2011, pp. 1973-1976.

[3] Gonzalez, Sira, and Mike Brookes. "A Pitch Estimation Filter robust to high levels of noise (PEFAC)." 19th European Signal Processing Conference. Barcelona, 2011, pp. 451-455.

[4] Signal Processing and Speech Communication Laboratory. Accessed September 26, 2018. https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html.

[5] "Arturo Camacho." Accessed September 26, 2018. https://www.cise.ufl.edu/~acamacho/english/.

[6] "Fxpefac." Description of Fxpefac. Accessed September 26, 2018. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html.

Get trial now