Python：pdf文件的哈希與下載的對象

Question

我想檢查網絡服務器上的pdf內容是否與我的計算機上的pdf內容相同。 我嘗試了一下但沒有成功：

>>> import requests, hashlib
>>> pdf = requests.get('<http link to pdf file>')
>>> type(pdf.content)
<class 'bytes'>
>>> type(repr(open('file.pdf','rb')).encode('utf-8'))
<class 'bytes'>
>>> hashlib.sha256(repr(open('file.pdf','rb')).encode('utf-8')) == hashlib.sha256(repr(pdf.content).encode('utf-8')).hexdigest()
False
>>> hashlib.sha256(repr(open('file.pdf','rb')).encode('utf-8')) == hashlib.sha256(pdf.content).hexdigest()
False

Answer 1

您正在散列文件對象的UTF-8編碼的repr ，而不是文件的內容。 無論如何，沒有理由使用repr 。 直接哈希內容。

>>> with open('file.pdf', 'rb') as f:
...     h1 = hashlib.sha256(f.read()).digest()
>>> h2 = hashlib.sha256(pdf.content).digest()
>>> h1 == h2
True

Answer 2

第一個散列是文件對象表示形式的散列（而不是其內容）：

repr(open('file.pdf','rb'))  
    # "<_io.BufferedReader name='file.pdf'>"
repr(open('file.pdf','rb')).encode('utf-8')  
    # b"<_io.BufferedReader name='file.pdf'>"

您的第一個哈希超過了bytes ： b"<_io.BufferedReader name='file.pdf'>" 。

Python：pdf文件的哈希與下載的對象

問題描述

2 個解決方案

解決方案1
2 已采納

解決方案2
2 2017-07-13 16:13:29

Python：pdf文件的哈希與下載的對象

問題描述

2 個解決方案

解決方案1 2 已采納

解決方案2 2 2017-07-13 16:13:29

解決方案1
2 已采納

解決方案2
2 2017-07-13 16:13:29