简体   繁体   中英

How to generate OS agnostic file hash in python?

I am trying to hash a file, using hashlib library in python, by reading its contents as binary chunks of 4096B.

The issue is that it generates different hash for the same file on Windows and Mac.

What is more interesting is that the file is present in a git repo and when pushed to a remote server from Windows and Mac, it generates different hashes for the two scenarios.

I understand that there is an issue with the line endings in Windows being '\\r\\n' and in Mac '\\n'.

This is the code that we had below that generated different hashes.

def get_file_hash(file_path: str) -> str:
    hash_md5 = hashlib.md5()
    with open(file_path, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

As a quick fix we replaced '\\r\\n' by '\\n'

def get_file_hash(file_path: str) -> str:
    hash_md5 = hashlib.md5()
    with open(file_path, "r") as f:
        for chunk in f.readlines():
            encoded_chunk = chunk.encode("utf-8").replace(b"\r\n", b"\n")
            print(encoded_chunk)
            hash_md5.update(encoded_chunk)
    return hash_md5.hexdigest()

Is this a robust way to do this?

It looks you are hashing text files. Try opening them in text mode like this, then encoding each line before updating the hash:

import hashlib

def get_file_hash(file_path: str) -> str:
    hash_md5 = hashlib.md5()
    with open(file_path, "rt") as f:
        for line in f.readline():
            hash_md5.update(line.encode('utf-8'))
    return hash_md5.hexdigest()

print(get_file_hash('file.txt'))

Python reads newlines as the universal \\n so this should generate the same hash on any platform.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM