简体   繁体   English

使用所有字符串获取相同的sha1哈希

[英]Getting the same sha1 hash with all strings

I have a script that opens a file, looks for anything that has HASH("<stuff>") and replaces it with HASH(<sha1(stuff)>) 我有一个脚本可以打开文件,查找具有HASH("<stuff>")任何内容,并将其替换为HASH(<sha1(stuff)>)

The entirety of the script is this: 整个脚本是这样的:

import sys
import re
import hashlib

def _hash(seq, trim_bits=64):
    assert trim_bits % 8 == 0
    temp = hashlib.sha1(seq).hexdigest()
    temp = int(temp, 16) & eval('0x{}'.format('F' * (trim_bits/4)))
    temp = hex(temp)
    return str(temp[2:]).replace('L', '')

if __name__ == '__main__':
    assert len(sys.argv) == 3
    in_file = sys.argv[1]
    out_file = sys.argv[2]
    with open(in_file, 'r') as f:
        lines = f.readlines()
        out_handle = open(out_file, 'w')
        for line in lines:
            new_line = re.sub(r'HASH\((["\'])(.*?)\1\)', 'HASH({})'.format(_hash(r'\2')), line)
            out_handle.write(new_line)
        out_handle.close()

When I run this however, all of the sha1 hashes become the exact same which doesn't make sense to me. 但是,当我运行此命令时,所有的sha1哈希值都变得完全相同,这对我来说毫无意义。 If instead of writing the hash I switch it with HASH({}).format(r'\\2') it will replace it with the sequence of characters in between double quotes. 如果不是写哈希,而是用HASH({}).format(r'\\2')切换,它将用双引号之间的字符序列替换它。 So why does the sha1 hash return the same string? 那么,为什么sha1哈希返回相同的字符串?

You are calculating the hash for the string r'\\2' ; 您正在计算字符串r'\\2'的哈希值; the re module would only replace that placeholder when you use that as the replacement string, but you are not doing that here. re模块仅在将其用作替换字符串时才替换该占位符,但是您在这里没有这样做。

Pass in the group from the match object instead, using a replacement function: 而是使用替换函数从匹配对象传递组:

def replace_with_hash(match):
    return 'HASH({})'.format(_hash(match.group(2)))

new_line = re.sub(r'HASH\((["\'])(.*?)\1\)', replace_with_hash, line)

The replace_with_hash() function is passed the match object, and its return value is used as the replacement. replace_with_hash()函数传递给match对象,并将其返回值用作替换对象。 Now you can calculate the hash for the 2nd group! 现在您可以计算第二组的哈希值了!

Demo: 演示:

>>> import re
>>> def _hash(string):
...     return 'HASHED: {}'.format(string[::-1])
... 
>>> sample = '''\
... HASH("<stuff>")
... '''
>>> re.sub(r'HASH\((["\'])(.*?)\1\)', 'HASH({})'.format(_hash(r'\2')), sample)
'HASH(HASHED: 2\\)\n'
>>> def replace_with_hash(match):
...     return 'HASH({})'.format(_hash(match.group(2)))
... 
>>> re.sub(r'HASH\((["\'])(.*?)\1\)', replace_with_hash, sample)
'HASH(HASHED: >ffuts<)\n'

My _hash() function simply reverses the input string to show what happens. 我的_hash()函数只是反转输入字符串以显示会发生什么。

The first re.sub() is your version; 第一个re.sub()是您的版本; notice how it returns '2\\\\' , so r'\\2' reversed! 注意它如何返回'2\\\\' ,所以r'\\2'相反! My version neatly hashes <stuff> to >futts< . 我的版本将<stuff>整齐地哈希为>futts<

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM