如何從URL打開大文件到內存並在Python中創建哈希

Question

我正在嘗試創建一個函數，該函數將從URL打開最大20MB的文件到內存中。 我需要創建一個一致的哈希。

這是我能得到的最接近的。

import os, hashlib, optparse, requests

def get_remote_sha_sum(url):

  url_file = requests.get(url)
  sha1 = hashlib.sha1()

  with open(url_file, "rb") as f:
    while True:
      data = f.read(65536)
      if not data:
        break
      sha1.update(data)

  return sha1.hexdigest()

if __name__ == '__main__':
  opt = optparse.OptionParser()
  opt.add_option('--url', '-u', default='https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf')

  options, args = opt.parse_args()
  print get_remote_sha_sum(options.url)

但是它導致：

TypeError: coercing to Unicode: need string or buffer, Response found

我已經嘗試了數十種方法。'我BitesIO遇到了相同的錯誤消息。

如何在內存中打開大文件，對其進行緩沖並創建哈希？

請客氣，我對Python還是有點陌生。

Answer 1

在這里，您不需要將文件分成幾批，這不是一個大文件。

由於技術原因，我無法使用請求庫，但是string = requests.get(url).text應該可以與代碼配合使用

import os, hashlib, optparse, requests

def get_remote_sha_sum(url):

  # url_file = requests.get(url)
  sha1 = hashlib.sha1()
  string = """<html><body style="background-color: rgb(38,38,38); height: 100%; width: 100%; overflow: hidden; margin: 0"><embed width="100%" height="100%" name="plugin" id="plugin" src="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf" type="application/pdf" internalinstanceid="4"></body></html>"""
  sha1.update(string.encode('utf-8'))
  return sha1.hexdigest()

print(get_remote_sha_sum('https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'))

您的問題不是關於文件的大小，而是關於url_file是一個Reponse對象，而不是sha1庫所需的字符串

如何從URL打開大文件到內存並在Python中創建哈希

問題描述

1 個解決方案

解決方案1
2 已采納 2019-02-07 14:36:03

如何從URL打開大文件到內存並在Python中創建哈希

問題描述

1 個解決方案

解決方案1 2 已采納 2019-02-07 14:36:03

解決方案1
2 已采納 2019-02-07 14:36:03