Voice Recognition - how to add a threshold value ?
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Lakitha Hiran
el 12 de En. de 2014
Respondida: Lakitha Hiran
el 15 de En. de 2014
Dear friends,
i have used a problem with a matlab project which is to identify voice and display numbers between 0-9 using Linear Predictive Code (LPC) approach. The recorded voices are stored in a database and using a neural network the training process is carried using those LPC coefficients are stored in the database. Do i have to use a threshold value for this problem ?
Ex - when a user speak zero it display 0 , five -> 5. The code is given below, the problem i am facing now is that even if i say a word like EAT it displays a number which should not happen. What is the solution for this and to display NOT FOUND? i really appreciate if someone can help me in this.
Code :-
if true
clc;
clear all;
load('voicetrainfinal.mat');
Fs=8000;
for l=1:20
clear y1 y2 y3;
display('Press ENTER to record your voice !');
pause();
x=wavrecord(Fs,Fs);
t=0.04;
j=1;
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
y2=y1/(max(abs(y1)));
y3=[y2,zeros(1,3120-length(y2))];
y=filter([1 -0.9],1,y3'); % high pass filter to boost the high frequency components
%%frame blocking
blocklen=240;%30ms block
overlap=80;
block(1,:)=y(1:240);
for i=1:18
block(i+1,:)=y(i*160:(i*160+blocklen-1));
end
w=hamming(blocklen);
for i=1:19
a=xcorr((block(i,:).*w'),12); % finding auto correlation from lag -12 to 12
for j=1:12
auto(j,:)=fliplr(a(j+1:j+12)); % forming autocorrelation matrix from lag 0 to 11
end
z=fliplr(a(1:12)); % forming a column matrix of autocorrelations for lags 1 to 12
alpha=pinv(auto)*z';
lpc(:,i)=alpha;
end
wavplay(x,Fs);
X1=reshape(lpc,1,228);
a1=sigmoid(Theta1*[1;X1']);
h=sigmoid(Theta2*[1;a1]);
m=max(h);
p1=find(h==m);
if(p1==10)
P=0
else
P=p1
end
end
0 comentarios
Respuesta aceptada
Greg Heath
el 15 de En. de 2014
Consider this forced classification example
rng(0)
trueclassindices = [ randperm(10),randperm(10)]-1
% [ 5 2 6 7 4 0 1 3 8 9 5 0 6 3 8 4 7 2 9 1 ]
targets = full( ind2vec(trueclassindices+1) )
targets are columns of the 10 dimensional unit matrix eye(10)
the inverse relation between the column vectors and the class indices is given by
trueclassindices = vec2ind(targets)-1
The range of outputs will depend on the output transfer function:
softmax: (0,1) , sum(outputs) = ones(1,10)
logsig : (0,1) , sum(outputs) not constrained
purelin: neither range or sum is constrained
All are assumed to give "consistent" estimates of the class posterior probabilities, conditional on the input P(i|input). The assigned class is determined by the maximum output:
y = net(x)
[ Pmax index ] = max(y)
assignedclassindex = index-1
However, for conditional classification, if Pmax is not large enough, a class assignment is not made
if Pmax >= threshold
assignedclassindex = index-1 % 0-9
else
assignedclassindex = 10
end
The threshold is determined from the training and validation data.
It may be prudent to add nonclass data to help determine the threshold.
With forced classification
outputs = [ y1 y2 y3... y20]
assignedclassindices = vec2ind(outputs)-1
err = ( assignedclassindices ~= trueclassindices)
% From this vector of zeros and ones, the error rates of each class can be determined. The total error rate is given by
Nerr = sum(err); PctErr = 100*Nerr/N.
However, with conditional classification, modifications would have to be made to include
the nonclassification rate.
Hope this helps.
Thank you for formally accepting my answer
Greg
0 comentarios
Más respuestas (1)
Ver también
Categorías
Más información sobre Speech Recognition en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!