简体   繁体   中英

Python regex for MD5 hash

I've come up with:

re.findall("([a-fA-F\d]*)", data)

but it's not very fool proof, is there a better way to grab all MD5-hash codes?

好吧,因为md5只是一个32位十六进制数字的字符串,所以你可以添加到你的表达式中的是检查“32位数”,也许是这样的?

re.findall(r"([a-fA-F\d]{32})", data)

When using regular expressions in Python, you should almost always use the raw string syntax r"..." :

re.findall(r"([a-fA-F\d]{32})", data)

This will ensure that the backslash in the string is not interpreted by the normal Python escaping, but is instead passed through to the re.findall function so it can see the \\d verbatim. In this case you are lucky that \\d is not interpreted by the Python escaping, but something like \\b (which has completely different meanings in Python escaping and in regular expressions) would be.

See the re module documentation for more information.

Here's a better way to do it than some of the other solutions:

re.findall(r'(?i)(?<![a-z0-9])[a-f0-9]{32}(?![a-z0-9])', data)

This ensures that the match must be a string of 32 hexadecimal digit characters, but which is not contained within a larger string of other alphanumeric characters. With all the other solutions, if there is a string of 37 contiguous hexadecimals the pattern would match the first 32 and call it a match, or if there is a string of 64 hexadecimals it would split it in half and match each half as an independent match. Excluding these is accomplished via the lookahead and lookbehind assertions, which are non-capturing and will not affect the contents of the match.

Note also the (?i) flag which will makes the pattern case-insensitive which saves a little bit of typing, and that wrapping the entire pattern in parentheses is superfluous.

Here's a fairly pedantic expression:

r"\b([a-f\d]{32}|[A-F\d]{32})\b"
  • string must be exactly 32 characters long,
  • string must be between a word boundary (newline, space, etc),
  • alpha must all be lowercase af OR all uppercase AF, but not mixed.

But if that just a'int good enough fr'yer, because you know there is only a 1 in 3402823 chance of getting an all-numeric MD5 checksum, and a 42 trillion to one chance of an all-alphanumeric MD5 checksum, then you know we should probably say FU to those valid sums and also not accept anything that isn't alphanumeric:

r"\b(?!^[\d]*$)(?!^[a-fA-F]*$)([a-f\d]{32}|[A-F\d]{32})\b"

00000000000000000000000000000000 # not MD5
01110101001110011101011010101001 # not MD5
ffffffffffffffffffffffffffffffff # not MD5
A32efC32c79823a2123AA8cbDDd3231c # not MD5
affa687a87f8abe90d9b9eba09bdbacb # is MD5
C787AFE9D9E86A6A6C78ACE99CA778EE # is MD5
please like and subscribe to my  # not MD5

yes i've been terribly bored at work.

MD5 Python Regex With Examples

Since an MD5 is composed of exactly 32 Hexadecimal Characters, and sometimes the hash is presented using lowercase letters, one should account for them as well.


The below example was tested against four different strings:

  • A valid lowecase MD5 hash
  • A valid uppercase MD5 hash
  • A string of 64 Hexadecimal characters (to ensure a split & match wouldn't occur)
  • A string of 37 Hexadecimal characters (to ensure the leading 32 characters wouldn't match)

900e3f2dd4efc9892793222d7a1cee4a

AC905DD4AB2038E5F7EABEAE792AC41B

900e3f2dd4efc9892793222d7a1cee4a900e3f2dd4efc9892793222d7a1cee4a

900e3f2dd4efc9892793222d7a1cee4a4a4a4


    validHash = re.finditer(r'(?=(\b[A-Fa-f0-9]{32}\b))', datahere)

    result = [match.group(1) for match in validHash]

    if result: 

        print "Valid MD5"

    else:

        print "NOT a Valid MD5"

“([a-fA-F \\ d] {32})”要求它长32个字符怎么样?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM