How to assess frequency stability of sung notes while keeping frequency precision under the possiblity that a vibrato (frequency modulation) is present.
4 views (last 30 days)
Show older comments
Jonas on 27 Dec 2021
Commented: William Rose on 18 Jan 2022
i have different recordings of a sequency of tones with increasing frequency. this tone sequence is repeated and sung at different base speeds (about 60 bpm up to 240 bpm). i'm not only interested in the precise catching of the fundamental frequency, but also in the tones' stability.
to get a precise analysis of the fundamental frequency, i am analyzing using a as big tone time window as possible, since the fft bin widths are reciprocal to the time window length.
concerning the stability of those sung tones i see two problems: since the tones are of different lengths, the time windiws and fft bins are of different length/width and as a consequence the bandwdth of e.g. the fundamental frequency peak bin is not comparable.
the other problem is the possible presence of a vibrato, which is a frequency modulation (a periodic sinusoidal jitter in frequency) which leads to a broader peak in frequency domain, but which influence i want to remove. saying that i am only interested in unperiodic changes in frequency per sung tone. my only idea to remove the influence of the vibrato is to analyse base frequency with a realy small window, looking at the sequence of extracted fundamental frequencies and temoving that using a lowpass (those vibrato frequency can be e.g. 8Hz) but by analysing with such small windows i loose the precision of the general fundamental frequency.
i hope you got my basic idea and problem and i would be happy to hear some ideas to solve my dilemma.
EDIT: I now added a sample file in which many tones contain a vibrato
William Rose on 30 Dec 2021
@Jonas, I put my notebook computer on a piano and reorded while I played the notes CDEFGA. File attached. I used this file as an example.
The code reads the file and generates a spectrogram. The value twin=0.1 sets the time-frequency resolution. This value is the widow width in seconds used for the spectrogram. The window is moved in steps equal to half of twin across the signal, and the spectrum is computed at each point. The frequency resolution of hte spectrum is 1/twin. You can get more time resolution by reducing twin, but the spectral resolution gets worse. And vice versa if you increase twin.
The script also generates a plot of volume versus time.
The user observes this plot and selects a volume threshold for frequency estimation. I chose 25 dB in this case - see dashed line on plot. Frequencies will be estimated only at times when volume>=threshold. The user also specifies the max frequency to consider. In this case, my highest note was A440, so I chose fmax=500 Hz. The script finds the frequency (<=fmax) with maximum intensity at every time point where volume exceeds threshold. The results are plotted on a second spectrogram plot.
The results are also plotted on a 2D plot of frequency versus time.
You can improve the frequency resolution by fitting a parabola to the spectrum at each maximum point plus the point on either side in the spectrum, within every time slice. The peak of the parabola will usually be at a frequency that is between the discrete frequencies of the FFT. This enhances your frequency resolution. Here is a routine which demonstrates how to do this.
You can also decide how you want to analyze the frequency data for each note, to assess "quality" or other features of each note. You can try different values for twin.
More Answers (1)
William Rose on 28 Dec 2021
Edited: William Rose on 28 Dec 2021
Are you willing to post an example file of such a recoridng?
Do you have an external reference signal, like a trigger, for when the pitch is (suposed to) change? If so, consider using it to partition the signal.
240 bpm is fast! You'd be lucky to get 1/8 of a second n each note at that speed, plus 1/8 second in between notes to change. And as you know, of the recording is only 1/8 of a second long, then the frequency resolution of the simple FFT 8 Hz.
I would experiemnt with the short-time Fourier transform - stft(). See the Matlab help here. I would choose a time reslution in the range (0.5 to 1)/(singer's beats per second) to get started.
I am skeptical that low passing to eliminate vibrato will help, but give it a try.
Also, check out MUSIC (MUltile SIgnal Clssification) on the Matlab file exchange.
Google "frequency estimation form short samples" and you will get a lot of journal articles that look promising.
Find more on Signal Processing Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!