Info

La pregunta está cerrada. Vuélvala a abrir para editarla o responderla.

Ignore Deletions with Edit Distances (String Editing)

1 visualización (últimos 30 días)
Marcel Dorer
Marcel Dorer el 22 de Abr. de 2016
Cerrada: MATLAB Answer Bot el 20 de Ag. de 2021
Hi, I'm trying to compare 2 strings with a function based on Miguel Castro's EditDist.m function. The function works pretty well but in my case I need to ignore some of the Deletions, namely all in the beginning and the end of the string.
For example when I compare the 2 Strings 'XXXXMatlabXXXX' and 'YYMatlabYY' the first 2 'X' and the last 2 'X' which would be deletions shouldn't count towards the EditDistance value (which should be 4 in this case). Basically one of the 2 strings has a random number of random surrounding values that should be ignored, deletions after the first Insertion/Replacement/Correct Value should be counted normally, at least until there is only a tail of deletions left.
Help would be really appreciated!
Here is the relevant part of the function I'm using:
for i = 1:n1
D(i+1,1) = D(i,1) + DelCost;
end;
for j = 1:n2
D(1,j+1) = D(1,j) + InsCost;
end;
for i = 1:n1
for j = 1:n2
if s1(i) == s2(j)
Repl = 0;
else
Repl = ReplCost;
end;
D(i+1,j+1) = min([D(i,j)+Repl D(i+1,j)+DelCost D(i,j+1)+InsCost]);
end;
end;
d = D(n1+1,n2+1);

Respuestas (1)

Arnab Sen
Arnab Sen el 26 de Abr. de 2016
Editada: Arnab Sen el 27 de Abr. de 2016
Hello Marcel,
I am assuming that between two strings s1 and s2, s1 is known to be the one which is wrapped with some redundant characters.
Now, let's dig into what is meant by D(i,j) in the script. It means that the conversion cost of s1.substring(1,i) to s2.substring(1,j) and vice verse. Now, let's assume that after kth index of s1, all the indices are redundant. So,
D(n1,n2)=D(k,n2)+(n1-k)*DelCost.
So, Now the task is simple. We need to find out the value of k. Following code snippet should do that:
i=n1;
while(D(i,n2)-D(i-1,n2)==DelCost)
{
i=i-1;
}
k=i;
So, the last (n1-k) chars are redundant in s1.
Now we need to find out the front end redundant characters in s1. For this we can create another table (say X) where
X(i,j)= The conversion cost of s1.subtring(i,n1) to s2.sunstring(j,n2) and adopt similar approach.
A simpler approach would be just reverse the string s1 (say s1')and s2 (s2') and call edit distance again and perform same workflow. Now redundant character at the end of s1' are the redundant characters in the front end of the original string s1.
At the end subtracts DelCost*(number of total redundant characters in s1) from the original output.
  2 comentarios
Marcel Dorer
Marcel Dorer el 26 de Abr. de 2016
Thanks a lot for the answer, it was pretty helpful and I understand the principle. There is only 1 thing I fail to understand:
{
i--;
}
I'm no matlab expert and I have to admit that I've never seen an expression like that. If I try to use that part in matlab a bracket error occurs. I'd really appreciate if you could explain this a little more!
Arnab Sen
Arnab Sen el 26 de Abr. de 2016
Editada: Arnab Sen el 26 de Abr. de 2016
Hi,
You are correct. MATLAB does not recognize i--. It's common in languages like C, C++, Java. Please consider the expression as
{
i=i-1;
}
I have edited the original answer as well accordingly. Thanks for pointing this out.
Please accept the answer if this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by