简体   繁体   English

'utf-8'编解码器无法解码字节0x80

[英]'utf-8' codec can't decode byte 0x80

I'm trying to download BVLC-trained model and I'm stuck with this error 我正在尝试下载受BVLC训练的模型,我遇到了这个错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

I think it's because of the following function ( complete code ) 我认为这是因为以下功能( 完整代码

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

Any idea how to fix this? 知道如何解决这个问题吗?

You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8. 您正在打开一个非UTF-8编码的文件,而系统的默认编码设置为UTF-8。

Since you are calculating a SHA1 hash, you should read the data as binary instead. 由于您正在计算SHA1哈希,因此您应该将数据读取为二进制 The hashlib functions require you pass in bytes: hashlib函数要求您传入字节:

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

Note the addition of b in the file mode. 请注意在文件模式下添加b

See the open() documentation : 请参阅open()文档

mode is an optional string that specifies the mode in which the file is opened. mode是一个可选字符串,用于指定打开文件的模式。 It defaults to 'r' which means open for reading in text mode. 它默认为'r' ,表示在文本模式下打开。 [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. [...]在文本模式下,如果编码未指定使用的编码依赖于平台: locale.getpreferredencoding(False)被调用来获得当前本地编码。 (For reading and writing raw bytes use binary mode and leave encoding unspecified.) (对于读取和写入原始字节,请使用二进制模式并保留未指定的编码 。)

and from the hashlib module documentation : 并从hashlib模块文档

You can now feed this object with bytes-like objects (normally bytes) using the update() method. 现在,您可以使用update()方法为此对象提供类似字节的对象(通常为字节)。

You didn't specify to open the file in binary mode, so f.read() is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. 您没有指定以二进制模式打开文件,因此f.read()尝试将文件读取为UTF-8编码的文本文件,这似乎不起作用。 But since we take the hash of bytes , not of strings , it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file. 但是因为我们采用字节的散列而不是字符串 ,所以编码是什么,甚至文件是否都是文本都无关紧要:只需打开它,然后将其作为二进制文件读取。

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

but

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325

Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0): 由于文档和src代码中没有一个提示,我不知道为什么,但使用b char(我猜二进制)完全有用(tf-version:1.1.0):

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile 有关更多信息,请查看:gfile

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:UnicodeDecodeError:'utf-8'编解码器无法解码 position 中的字节 0x80 0:无效起始字节 - Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte UnicodeDecodeError:&#39;utf-8&#39;编解码器无法解码位置0的字节0x80:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte UnicodeDecodeError: &#39;utf-8&#39; 编解码器无法解码位置 3131 中的字节 0x80:起始字节无效 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte 'utf-8' 编解码器无法解码 position 中的字节 0x80 28:起始字节无效 - 'utf-8' codec can't decode byte 0x80 in position 28: invalid start byte UnicodeDecodeError:&#39;utf-8&#39;编解码器无法解码位置3131中的字节0x80:我的代码中的无效起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte in my code 在Windows上使用python错误:UnicodeDecodeError:&#39;utf-8&#39;编解码器无法解码位置110的字节0x80:无效的起始字节 - using python on windows error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte 'utf-8' 编解码器无法解码 position 中的字节 0x80 3131:无效的起始字节':在读取 xml 文件时 - 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte': while reading xml files 在Google colaboratory上使用pydrive加载pickle文件时出现“ UnicodeDecodeError:&#39;utf-8&#39;编解码器无法解码字节0x80”的消息 - “UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80” while loading pickle file using pydrive on google colaboratory Python utf8编解码器无法解码位置103的字节0x80:无效的起始字节 - Python utf8 codec can't decode byte 0x80 in position 103:invalid start byte 错误:'utf8'编解码器无法解码位置0中的字节0x80:无效的起始字节 - Error: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM