简体   繁体   中英

construct edit matrix for Levenshtein distance

To calculate Levenshtein distance, we always choose to use dynamic programming. For this, we will create an edit distance matrix as shown below:

enter image description here

Here is the code:

while True:
    try:
        a = input()
        b = input()
         
        board = [[0 for j in range(len(b)+1)] for i in range(len(a)+1)]
         
        for i in range(len(a)+1):
            board[i][0] = i
        for j in range(len(b)+1):
            board[0][j] = j
             
        for i in range(1, len(a)+1):
            for j in range(1, len(b)+1):
                 
                if a[i-1] == b[j-1]:
                    d = 0
                else:
                    d = 1
                 
                board[i][j] = min(board[i-1][j]+1,
                                  board[i][j-1]+1,
                                  board[i-1][j-1]+d)
         
        print(board[-1][-1])
         
    except:
        break

So my question is when we construct the matrix, why we need to add 1 to len(a) and len(b). Because as shown in the picture before, only the red part is the valid part in the matrix. So I modified my code:

while True:
    try:
        a = input()
        b = input()

        board = [[0 for j in range(len(b))] for i in range(len(a))]

        for i in range(len(a)):
            board[i][0] = i

        for j in range(len(b)):
            board[0][j] = j

        for i in range(1, len(a)):
            for j in range(1, len(b)):
                if a[i] == b[j]:
                    d = 0
                else:
                    d = 1

                board[i][j] = min(board[i-1][j]+1,
                                  board[i][j-1]+1,
                                  board[i-1][j-1]+d)
        print(board[-1][-1])

    except:
        break 

I test this modified code and it still gives the correct answer in most tests. But when both strings are very long, the result will be 1 less. I am very confused about this. Maybe this question is stupid, but I still hope to be answered, thank you.

The problem with your solution is that you skip a[0] and b[0] case and you have to handle that case first. The original solution handles it with a[i-1] == b[j-1] when i = 1 and j = 1 but you don't

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM