[英]'utf-8' codec can't decode byte 0x80
I'm trying to download BVLC-trained model and I'm stuck with this error 我正在尝试下载受BVLC训练的模型,我遇到了这个错误
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte
I think it's because of the following function ( complete code ) 我认为这是因为以下功能( 完整代码 )
# Closure-d function for checking SHA1.
def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
with open(filename, 'r') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
Any idea how to fix this? 知道如何解决这个问题吗?
You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8. 您正在打开一个非UTF-8编码的文件,而系统的默认编码设置为UTF-8。
Since you are calculating a SHA1 hash, you should read the data as binary instead. 由于您正在计算SHA1哈希,因此您应该将数据读取为二进制 。 The
hashlib
functions require you pass in bytes: hashlib
函数要求您传入字节:
with open(filename, 'rb') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
Note the addition of b
in the file mode. 请注意在文件模式下添加
b
。
See the open()
documentation : 请参阅
open()
文档 :
mode is an optional string that specifies the mode in which the file is opened.
mode是一个可选字符串,用于指定打开文件的模式。 It defaults to
'r'
which means open for reading in text mode.它默认为
'r'
,表示在文本模式下打开。 [...] In text mode, if encoding is not specified the encoding used is platform dependent:locale.getpreferredencoding(False)
is called to get the current locale encoding.[...]在文本模式下,如果编码未指定使用的编码依赖于平台:
locale.getpreferredencoding(False)
被调用来获得当前本地编码。 (For reading and writing raw bytes use binary mode and leave encoding unspecified.)(对于读取和写入原始字节,请使用二进制模式并保留未指定的编码 。)
and from the hashlib
module documentation : 并从
hashlib
模块文档 :
You can now feed this object with bytes-like objects (normally bytes) using the update() method.
现在,您可以使用update()方法为此对象提供类似字节的对象(通常为字节)。
You didn't specify to open the file in binary mode, so f.read()
is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. 您没有指定以二进制模式打开文件,因此
f.read()
尝试将文件读取为UTF-8编码的文本文件,这似乎不起作用。 But since we take the hash of bytes , not of strings , it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file. 但是因为我们采用字节的散列而不是字符串 ,所以编码是什么,甚至文件是否都是文本都无关紧要:只需打开它,然后将其作为二进制文件读取。
>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte
but 但
>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325
Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0): 由于文档和src代码中没有一个提示,我不知道为什么,但使用b char(我猜二进制)完全有用(tf-version:1.1.0):
image_data = tf.gfile.FastGFile(filename, 'rb').read()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.