简体   繁体   中英

Calculating closest string match from a list of strings

I'm trying to find a way to calculate/determine the closest string match from a list of strings.

Here is the string that I want to find the closest match to: CTGGAG

From a list of strings:

matchlist = ['ACTGGA', 'CTGGAG', 'CTGGAA', 'CTGGTG', 'ACCGGT']

I've tried using the SequenceMatcher from difflib:

for t in match:
    assignseqmatch = SequenceMatcher(None, CTGGAG, t)
    ratio = assignseqmatch.ratio()
    seqratiomatchlist.append(ratio)
    for r, s in zip(seqratiomatchlist, neutralhex):
        neutralmatchscores[r].append(s)

However, when I use this method, the first four values in the list are all reported to have the same ratio (0.833333) when the third and fourth values in the list should have the highest ratio since there is only a one letter difference between CTGGAG , CTGGAA , and CTGGTG . I basically just want to calculate how many letter changes there are between the two strings. Is this possible?

要查找两个相等长度的字符串xy之间的字母变化数,请执行以下操作:

numChanges = sum(i != j for i, j in zip(x, y))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM