简体   繁体   中英

How to verify integrity of files using digest in python (SHA256SUMS)

I have a set of files and a SHA256SUMS digest file that contains a sha256() hash for each of the files. What's the best way to verify the integrity of my files with python?

For example, here's how I would download the Debian 10.net installer SHA256SUMS digest file and download/verify its the MANIFEST file in BASH

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

What is the best way to do this same operation (download and verify the integrity of the Debian 10 MANIFEST file using the SHA256SUMS file) in python?

You may calculate the sha256sums of each file as described in this blog post:

https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html

A sample implementation to generate a new manifest file may look like:

import hashlib
from pathlib import Path

# Your output file
output_file = "manifest-check"

# Your target directory
p = Path('.')

sha256_hash = hashlib.sha256()

with open(output_file, "w") as out:
  # Iterate over the files in the directory
  for f in p.glob("**/*"):
    # Process files only (no subdirs)
    if f.is_file():
      with open(filename,"rb") as f:
      # Read the file by chunks
      for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
      out.write(f + "\t" + sha256_hash.hexdigest() + "\n")

Alternatively, this seems to be achieved by manifest-checker pip package.

You may have a look at its source here https://github.com/TonyFlury/manifest-checkerand adjust it for python 3

The following python script implements a function named integrity_is_ok() that takes the path to a SHA256SUMS file and a list of files to be verified, and it returns False if any of the files couldn't be verified and True otherwise.

#!/usr/bin/env python3
from hashlib import sha256
import os

# Takes the path (as a string) to a SHA256SUMS file and a list of paths to
# local files. Returns true only if all files' checksums are present in the
# SHA256SUMS file and their checksums match
def integrity_is_ok( sha256sums_filepath, local_filepaths ):

    # first we parse the SHA256SUMS file and convert it into a dictionary
    sha256sums = dict()
    with open( sha256sums_filepath ) as fd:
        for line in fd:
            # sha256 hashes are exactly 64 characters long
            checksum = line[0:64]

            # there is one space followed by one metadata character between the
            # checksum and the filename in the `sha256sum` command output
            filename = os.path.split( line[66:] )[1].strip()
            sha256sums[filename] = checksum

    # now loop through each file that we were asked to check and confirm its
    # checksum matches what was listed in the SHA256SUMS file
    for local_file in local_filepaths:

        local_filename = os.path.split( local_file )[1]

        sha256sum = sha256()
        with open( local_file, 'rb' ) as fd:
            data_chunk = fd.read(1024)
            while data_chunk:
                sha256sum.update(data_chunk)
                data_chunk = fd.read(1024)

        checksum = sha256sum.hexdigest()
        if checksum != sha256sums[local_filename]:
            return False

    return True

if __name__ == '__main__':

    script_dir = os.path.split( os.path.realpath(__file__) )[0]
    sha256sums_filepath = script_dir + '/SHA256SUMS'
    local_filepaths = [ script_dir + '/MANIFEST' ]

    if integrity_is_ok( sha256sums_filepath, local_filepaths ):
        print( "INFO: Checksum OK" )
    else:
        print( "ERROR: Checksum Invalid" )

Here is an example execution:

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    

2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ ./sha256sums_python.py 
INFO: Checksum OK
user@host:~$ 

Parts of the above code were adapted from the following answer on Ask Ubuntu:

Python 3.11 added hashlib.file_digest()

https://docs.python.org/3.11/library/hashlib.html#file-hashing

Generating the digest for a file:

with open("my_file", "rb") as f:
    digest = hashlib.file_digest(f, "sha256")
    s = digest.hexdigest()

Compare s against the information you have in SHA256SUMS .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM