python 3.9 - 无法为循环中的多个文件获取正确的 sha1 哈希

Question

通过引用以下链接中的解决方案中给出的代码，在循环中没有为第二个以后的文件获取正确的 SHA1 哈希。 为什么说不正确，因为

使用下面给出的代码： -

正确 -> 尝试单独为同一文件生成 SHA1 哈希（通过执行代码两次）然后获得不同的 SHA1 哈希（正确）和
INCORRECT -> 在单次执行中为多个文件生成哈希时，包括此文件，然后为此文件获取不同的哈希（不正确）->

请建议是否在此代码中进行任何修改或需要选择任何其他方法？

通过引用底部给出的链接编写的代码->

import glob
import hashlib
import os

path = input("Please provide path to search for file pattern (search will be in this path sub-directories also: ")
filepattern = input("Please provide the file pattern to search in given path. Example *.jar, *abc*.jar.: ")
assert os.path.exists(path), "I did not find the path " + str(path)
path = path.rstrip("/")
tocheck = (f'{path}/**/{filepattern}')
hash_obj = hashlib.sha1()

searched_file_list = glob.iglob(tocheck, recursive=True)
for file in searched_file_list:
    print(f'{file}')
    try:
        checksum = ""
        file_for_sha1 = ""
        file_for_sha1 = open(file, 'rb')
        hash_obj.update(file_for_sha1.read())
        checksum = hash_obj.hexdigest()
        print(f'sha1 for file ({file})= {checksum}')
    finally:
        file_for_sha1.close()

示例文件-> abc.txt，在 /home/test/git/reader/cabin/ 创建以下文本： - 您好，这是为了测试 SHA1 代码。

然后将此文件复制到另一个位置，即/home/test/git/reader/check/cabin/

Linux 控制台输出显示两个文件的 SHA1 相同：-

:~/git/reader/check/cabin$ sha1sum abc.txt
fc4db67f46711b2c18bd133abd67965649edfffc  abc.txt
:~/git/reader/check/cabin$ cd ../..
:~/git/reader$ cd cabin/
:~/git/reader/cabin$ sha1sum abc.txt
fc4db67f46711b2c18bd133abd67965649edfffc  abc.txt

单次执行中的循环代码- 从两个位置为此 abc.txt 文件生成两个不同的 SHA1：-

sha1 文件 (/home/test/git/reader/cabin/abc.txt)= fc4db67f46711b2c18bd133abd67965649edfffc
sha1 文件 (/home/test/git/reader/check/cabin/abc.txt)= a4691598ea25ea4c7404369a685725115c7f305b

通过给出各自的位置（一次一个文件）然后生成相同且正确的 SHA1 哈希，对同一文件执行两次代码：

sha1 文件 (/home/test/git/reader/check/cabin/abc.txt)= fc4db67f46711b2c18bd133abd67965649edfffc
sha1 文件 (/home/test/git/reader/cabin/abc.txt)= fc4db67f46711b2c18bd133abd67965649edfffc

参考代码链接 -> 在 Python 中生成多个文件的一个 MD5/SHA1 校验和

Answer 1

引用update方法的文档

重复调用等效于连接所有参数的单个调用： m.update(a); m.update(b) m.update(a); m.update(b)等价于m.update(a+b) 。

因此，不是单独查找两个文件的哈希，而是查找连接的两个文件的哈希。 这就是您链接的问题正在做的事情 - 多个文件的单个哈希。 您需要每个文件的哈希值，因此不要在同一个hash_obj实例上多次使用update方法，而是为每个文件创建一个新实例，所以

hash_obj = hashlib.sha1()
searched_file_list = glob.iglob(tocheck, recursive=True)
for file in searched_file_list:
    print(f'{file}')
    try:
        ...
        hash_obj.update(file_for_sha1.read())

会变成

searched_file_list = glob.iglob(tocheck, recursive=True)
for file in searched_file_list:
    print(f'{file}')
    try:
        hash_obj = hashlib.sha1()
        ...
        hash_obj.update(file_for_sha1.read())

python 3.9 - 无法为循环中的多个文件获取正确的 sha1 哈希

问题描述

1 个解决方案

解决方案1
0 2022-07-24 12:53:32

python 3.9 - 无法为循环中的多个文件获取正确的 sha1 哈希

问题描述

1 个解决方案

解决方案1 0 2022-07-24 12:53:32

解决方案1
0 2022-07-24 12:53:32