'utf-8'编解码器无法解码字节0x80

Question

I'm trying to download BVLC-trained model and I'm stuck with this error 我正在尝试下载受BVLC训练的模型，我遇到了这个错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

I think it's because of the following function ( complete code ) 我认为这是因为以下功能（完整代码）

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

Any idea how to fix this? 知道如何解决这个问题吗？

Answer 1

You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8. 您正在打开一个非UTF-8编码的文件，而系统的默认编码设置为UTF-8。

Since you are calculating a SHA1 hash, you should read the data as binary instead. 由于您正在计算SHA1哈希，因此您应该将数据读取为二进制 。 The hashlib functions require you pass in bytes: hashlib函数要求您传入字节：

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

Note the addition of b in the file mode. 请注意在文件模式下添加b 。

See the open() documentation : 请参阅open()文档：

mode is an optional string that specifies the mode in which the file is opened. mode是一个可选字符串，用于指定打开文件的模式。 It defaults to 'r' which means open for reading in text mode. 它默认为'r' ，表示在文本模式下打开。 [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. [...]在文本模式下，如果编码未指定使用的编码依赖于平台： locale.getpreferredencoding(False)被调用来获得当前本地编码。 (For reading and writing raw bytes use binary mode and leave encoding unspecified.) （对于读取和写入原始字节，请使用二进制模式并保留未指定的编码。）

and from the hashlib module documentation : 并从hashlib模块文档：

You can now feed this object with bytes-like objects (normally bytes) using the update() method. 现在，您可以使用update（）方法为此对象提供类似字节的对象（通常为字节）。

Answer 2

You didn't specify to open the file in binary mode, so f.read() is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. 您没有指定以二进制模式打开文件，因此f.read()尝试将文件读取为UTF-8编码的文本文件，这似乎不起作用。 But since we take the hash of bytes , not of strings , it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file. 但是因为我们采用字节的散列而不是字符串 ，所以编码是什么，甚至文件是否都是文本都无关紧要：只需打开它，然后将其作为二进制文件读取。

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

but 但

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325

Answer 3

Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0): 由于文档和src代码中没有一个提示，我不知道为什么，但使用b char（我猜二进制）完全有用（tf-version：1.1.0）：

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile 有关更多信息，请查看：gfile

'utf-8'编解码器无法解码字节0x80

问题描述

3 个解决方案

解决方案1
15 已采纳 2016-04-24 17:02:08

解决方案2
5 2016-04-24 17:01:24

解决方案3
2 2017-05-13 10:14:31

'utf-8'编解码器无法解码字节0x80

问题描述

3 个解决方案

解决方案1 15 已采纳 2016-04-24 17:02:08

解决方案2 5 2016-04-24 17:01:24

解决方案3 2 2017-05-13 10:14:31

解决方案1
15 已采纳 2016-04-24 17:02:08

解决方案2
5 2016-04-24 17:01:24

解决方案3
2 2017-05-13 10:14:31