简体   繁体   中英

levenshtein distance with items in list in python

I have two list, below, and i want to compare if words that are similar levenshtein distance of less than 2. I have a function to find the levenshtein distance, however as parameters it needs the two words. I can find which words are not in the other list, but it is not helping. And I can go index by index but as in the case below when i get to index 7 (but and except) everything is thrown off because infidelity will be index 9 and 8 and wcop88 is 9 and 10 hence those won't be compare. Is there some way to say if part of infidelity is in some word in the other list then check those two, note this won't always work because say if infidelity and infedellty there is only the in and ty that can match and many words could possibly match that

[u'rt', u'cuaimatizada', u's', u'cuaimaqueserespeta', u'forgives', u'any', u'mistake', u'but', u'the', u'infidelity', u'wocp88']
[u'rt', u'cuiamatizada', u's', u'cuimaqueserespeta', u'forgive', u'any', u'mistake', u'except', u'infedelity', u'wcop88']

Edit: So my goal is to be able to feed my levenshtein function the two words the need to be check. In this case the following pairs:

u'cuaimatizada      u'cuiamatizada

u'cuaimaqueserespeta u'cuimaqueserespeta

u'forgives   u'forgive

u'infedelity  u'infidelity

u'wocp88 u'wcop88

I do not know which words before hand.

I think this is what you want ... but it compares all words... not just matching indexes

 wordpairs = [(w1,w2) for w1 in list1 for w2 in list2 if levenstein(w1,w2) < 2]

>>> matches = [(w1,w2) for w1 in l12 for w2 in l22 if levenshtein(w1,w2) < 2]

[(u'rt', u'rt'), (u's', u's'), (u'cuaimaqueserespeta', u'cuimaqueserespeta'), (u'forgives', u'forgive'), (u'any', u'any'), (u'mistake', u'mistake'), (u'infidelity',u'infedelity')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM