[英]'utf-8' codec can't decode byte 0x89
I want to read a csv file and process some columns but I keep getting issues.我想读取一个 csv 文件并处理一些列,但我一直遇到问题。 Stuck with the following error:
遇到以下错误:
Traceback (most recent call last):
File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module>
for row in reader:
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte
>>>
My Code我的代码
import csv
with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f:
reader = csv.reader(f,delimiter=';',quotechar='|')
#print(sum(1 for row in reader))
for row in reader:
print(row)
if row:
value = row[6]
value = value.replace('(', '')
value = value.replace(')', '')
value = value.replace(' ', '')
value = value.replace('.', '')
value = value.replace('0032', '0')
if len(value) > 0:
print(value + ' Length: ' + str(len(value)))
I'm a beginner with Python, tried googling, but hard to find the right solution.我是 Python 的初学者,尝试过谷歌搜索,但很难找到正确的解决方案。
Can anyone help me out?谁能帮我吗?
This is the most important clue:这是最重要的线索:
invalid start byte
无效的起始字节
\\x89
is not, as suggested in the comments, an invalid UTF-8 byte. \\x89
不是,如评论中所建议的,是无效的 UTF-8 字节。 It is a completely valid continuation byte.它是一个完全有效的连续字节。 Meaning if it follows the correct byte value, it codes UTF-8 correctly:
意思是如果它遵循正确的字节值,它会正确编码 UTF-8:
http://hexutf8.com/?q=0xc90x89 http://hexutf8.com/?q=0xc90x89
So either you (1) do not have UTF-8 data as you expect, or (2) you have some malformed UTF-8 data.因此,要么您 (1) 没有您期望的 UTF-8 数据,要么 (2) 您有一些格式错误的 UTF-8 数据。 The Python codec is simply letting you know that it encountered
\\x89
in the wrong order in the sequence. Python 编解码器只是让您知道它在序列中以错误的顺序遇到了
\\x89
。
(More on continuation bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout ) (更多关于连续字节的信息: http : //en.wikipedia.org/wiki/UTF-8#Codepage_layout )
The first byte of a .PNG file is 0x89 . .PNG 文件的第一个字节是 0x89 。 Not saying that is your problem, but the .PNG header is specifically designed so that it is NOT accidentally interpreted as text .
并不是说这是您的问题,但是 .PNG 标头是专门设计的,因此不会意外地将其解释为 text 。
Why you would have a .csv file that is actually a .png I don't know.我不知道为什么你会有一个实际上是 .png 的 .csv 文件。 But it definitely could happen if someone accidentally renamed the file.
但如果有人不小心重命名了文件,这肯定会发生。 On windows 10 every once and a while I accidentally mass-rename files by accident because of their stupid checkbox feature.
在 Windows 10 上,由于其愚蠢的复选框功能,我偶尔会不小心对文件进行批量重命名。 Why Microsoft decided desktop machines having identical UI controls to tablets was I good idea... I don't know.
为什么 Microsoft 决定台式机具有与平板电脑相同的 UI 控件是我的好主意……我不知道。
I was also getting the similar error when trying to read or upload the following kinds of files:在尝试读取或上传以下类型的文件时,我也遇到了类似的错误:
The best way to avoid error like:避免错误的最佳方法,例如:
is to read these files as Bytes.是将这些文件作为字节读取。 When you treat them as byte then you need not provide any encoding value here.
当您将它们视为字节时,您无需在此处提供任何编码值。 So when you open them you should specify:
因此,当您打开它们时,您应该指定:
with open(file_path, 'rb') as file:
Or in your case, the code should be something like:或者在你的情况下,代码应该是这样的:
import csv
with open("c:\\\\Users\\\\Sven\\\\Desktop\\\\relaties 24112014.csv", newline='', 'rb') as f:
reader = csv.reader(f,delimiter=';',quotechar='|')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.