简体   繁体   中英

Calculation of Levenshtein distance

I'm not sure whether this question is repeated or not.But,I know like to know more about the optimized Levenshtein Distance Algorithm Implementation in R or Java or Python.I have a Text File which contains numerous strings line by line(close to 2000 records as shown below) in alphabetical order which might have some kind of similarity between them.Now,I want to compare all the pairs of strings in the file and output the distance matrix.Also,please let me know how to use this matrix to filter set strings based on my requirement say LD <=2.

Get back to me if the question is not clear and you need more information.

Sample Text File
----------------
abc
abcd
abe
bac
bad
back
blade
cub
cube
cute
dump
duke

So this can be done in a slightly reversed manner. Create your dictionary d = {word:[] for word in file} . Now:

for word in d:
    for neighbor in edit_distance_1(word):
        if neighbor in d:
            d[word].append(neighbor)

Now d will be a graph of all words to their edit-distance-1 neighbors. You can trace these edges further to get edit distance 2 words (via other words), which I believe is what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM