简体   繁体   English

比较python中的两个字符串

[英]Compare two strings in python

well i need to compare two strings or at least find a sequence of characters from a string to another string.好吧,我需要比较两个字符串或至少找到从一个字符串到另一个字符串的字符序列。 The two strings contain md5 of files which i must compare and say if i find a match.这两个字符串包含文件的md5 ,我必须比较这些文件并说明是否找到匹配项。

my current code is:我目前的代码是:

def comparemd5():
    origmd5=getreferrerurl()
    dlmd5=md5_for_file(file_name)
    print "original md5 is",origmd5
    print "downloader file md5 is",dlmd5
    s = difflib.SequenceMatcher(None, origmd5, dlmd5)
    print "ratio is:",s.ratio()

the output i get is:我得到的输出是:

original md5 is ['0430f244a18146a0815aa1dd4012db46', '0430f244a18146a0815aa1dd40
12db46', '59739CCDA2F15D5AC16DB6695CAE3378']

downloader file md5 is 59739ccda2f15d5ac16db6695cae3378

ratio is : 0.0

Thus!因此! there is a match from dlmd5 in origmd5 but somehow its not finding it... I am doing something wrong somewhere...Please help me out :/有从比赛dlmd5origmd5但不知何故,它没有找到它......我做错了什么地方......请帮我出:/

Basically, you want the idom if test_string in list_of_strings .基本上,你想要 idom if test_string in list_of_strings Looks like you don't need case sensitivity, so you might want看起来你不需要区分大小写,所以你可能想要

if test_string.lower() in (s.lower() for s in list_of_strings)

In your case:在你的情况下:

>>> originals = ['0430f244a18146a0815aa1dd4012db46', '0430f244a18146a0815aa1dd40 12db46', '59739CCDA2F15D5AC16DB6695CAE3378']
>>> test = '59739ccda2f15d5ac16db6695cae3378'
>>> if test.lower() in (s.lower() for s in originals):
...    print '%s is match, yeih!' % test
... 
59739ccda2f15d5ac16db6695cae3378 is match, yeih!

Looks like you're having a problem since the case isn't matching on the letters.看起来您遇到了问题,因为大小写与字母不匹配。 May want to try:可能想尝试:

def comparemd5():
    origmd5=[item.lower() for item in getreferrerurl()]
    dlmd5=md5_for_file(file_name)
    print "original md5 is",origmd5
    print "downloader file md5 is",dlmd5
    s = difflib.SequenceMatcher(None, origmd5, dlmd5)
    print "ratio is:",s.ratio()

Given the input:鉴于输入:

original md5 is ['0430f244a18146a0815aa1dd4012db46', '0430f244a18146a0815aa1dd40 12db46', '59739CCDA2F15D5AC16DB6695CAE3378']原始 md5 是 ['0430f244a18146a0815aa1dd4012db46'、'0430f244a18146a0815aa1dd40 12db46'、'59739CCDA2F15D5AC16DB33785']E

downloader file md5 is 59739ccda2f15d5ac16db6695cae3378下载文件 md5 是 59739ccda2f15d5ac16db6695cae3378

You have two problems.你有两个问题。

First of all, that first one isn't just an MD5, but an MD5 and two other things.首先,第一个不仅仅是一个 MD5,而是一个 MD5 和其他两个东西。

To fix that: If you know that origmd5 will always be in this format, just use origmd5[2] instead of origmd5 .要解决这个问题:如果您知道origmd5将始终采用这种格式,只需使用origmd5[2]而不是origmd5 If you have no idea what origmd5 is, except that one of the things in it is the actual MD5, you'll have to compare against all of the elements.如果您不知道origmd5是什么,除了其中的一件事是实际的 MD5,您将不得不与所有元素进行比较。

Second, the actual MD5 values are both hex strings representing the same binary data, but they're different hex strings (because one is in uppercase, the other in lowercase).其次,实际的 MD5 值都是表示相同二进制数据的十六进制字符串,但它们是不同的十六进制字符串(因为一个是大写,另一个是小写)。 You could fix this by just doing a case-insensitive comparison, but it's probably more robust to unhexlify them both and compare the binary values.您可以通过不区分大小写的比较来解决此问题,但将它们都unhexlify并比较二进制值可能更健壮。

In fact, if you've copied and pasted the output correctly, at least one of those hex strings has a space in the middle of it, so you actually need to unhexlify hex strings with optional spaces between hex pairs.事实上,如果您已正确复制和粘贴输出,那么这些十六进制字符串中至少有一个中间有一个空格,因此您实际上需要使用十六进制对之间的可选空格对十六进制字符串进行 unhexlify。 AFAIK, there is no stdlib function that does this, but you can write it yourself in one step: AFAIK,没有执行此操作的 stdlib 函数,但您可以一步自己编写:

def unhexlify(s):
    return binascii.unhexlify(s.replace(' ', ''))

Meanwhile, I'm not sure why you're trying to use difflib.SequenceMatcher at all.同时,我不确定您为什么要尝试使用difflib.SequenceMatcher Two slightly different MD5 hashes refer to completely different original sources;两个略有不同的 MD5 哈希指代完全不同的原始来源; that's kind of the whole point of MD5, and crypto hash functions in general.这就是 MD5 的全部要点,以及一般的加密哈希函数。 There's no such thing as a 95% match;没有 95% 的匹配; there's either a match, or a non-match.要么匹配,要么不匹配。

So, if you know the 3rd value in origmd5 is the one you want, just do this:因此,如果您知道origmd5的第三个值是您想要的值,请执行以下操作:

s = unhexlify(origmd5[2]) == unhexlify(dlmd5)

Otherwise, do this:否则,请执行以下操作:

s = any(unhexlify(origthingy) == unhexlify(dlmd5) for origthingy in origmd5)

Or, turning it around to make it simpler:或者,将其转过来使其更简单:

s = unhexlify(dlmd5) in map(unhexlify, origthingy)

Or whatever equivalent you find most readable.或者任何你认为最易读的等价物。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM