简体   繁体   中英

Levenshtein edit distance Python

This piece of code returns the Levenshtein edit distance of 2 terms. How can i make this so that insertion and deletion only costs 0.5 instead of 1 ? substitution should still costs 1.

def substCost(x,y):
   if x == y: 
      return 0
   else: 
      return 1

def  levenshtein(target, source):
   i = len(target); j = len(source)
   if i == 0:  
      return j
   elif j == 0: 
      return i

   return(min(levenshtein(target[:i-1],source)+1,
          levenshtein(target, source[:j-1])+1,
          levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1])))

There are two places you need to account for the reduced cost of adding or removing a vowel. They are the return j and return i lines in the base cases of your function, and the +1 s in the min call after the first two recursive calls.

We need to change each of those to use a "ternary" expression: 0.5 if ch in 'aeiou' else 1 in place of assuming a cost of 1 per character added or removed.

For the base cases, we can replace the return values with sum calls on a generator expression that includes the ternary expression:

if i == 0:  
    return sum(0.5 if ch in 'aeiou' else 1 for ch in source)
elif j == 0: 
    return sum(0.5 if ch in 'aeiou' else 1 for ch in target)

For later cases, we can replace the +1 with the ternary expression itself (with an index rather than the ch iteration variable):

return min(levenshtein(target[:i-1],source) + (0.5 if target[-1] in 'aeiou' else 1),
           levenshtein(target, source[:j-1]) + (0.5 if source[-1] in 'aeiou' else 1),
           levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1]))

If you wanted to generalize this, you could move the ternary expression out into its own function, named something like addCost , and call it from the code in the levenshtein function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM