“utf-8”编解码器无法解码字节 0x89

Question

I want to read a csv file and process some columns but I keep getting issues.我想读取一个 csv 文件并处理一些列，但我一直遇到问题。 Stuck with the following error:遇到以下错误：

Traceback (most recent call last):
  File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module>
    for row in reader:
  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte
>>>

My Code我的代码

import csv
with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f:
    reader = csv.reader(f,delimiter=';',quotechar='|')
    #print(sum(1 for row in reader))
    for row in reader:
        print(row)
        if row:
            value = row[6]
            value = value.replace('(', '')
            value = value.replace(')', '')
            value = value.replace(' ', '')
            value = value.replace('.', '')
            value = value.replace('0032', '0')
            if len(value) > 0:
                print(value + ' Length: ' + str(len(value)))

I'm a beginner with Python, tried googling, but hard to find the right solution.我是 Python 的初学者，尝试过谷歌搜索，但很难找到正确的解决方案。

Can anyone help me out?谁能帮我吗？

Answer 1

This is the most important clue:这是最重要的线索：

invalid start byte无效的起始字节

\\x89 is not, as suggested in the comments, an invalid UTF-8 byte. \\x89不是，如评论中所建议的，是无效的 UTF-8 字节。 It is a completely valid continuation byte.它是一个完全有效的连续字节。 Meaning if it follows the correct byte value, it codes UTF-8 correctly:意思是如果它遵循正确的字节值，它会正确编码 UTF-8：

http://hexutf8.com/?q=0xc90x89 http://hexutf8.com/?q=0xc90x89

So either you (1) do not have UTF-8 data as you expect, or (2) you have some malformed UTF-8 data.因此，要么您 (1) 没有您期望的 UTF-8 数据，要么 (2) 您有一些格式错误的 UTF-8 数据。 The Python codec is simply letting you know that it encountered \\x89 in the wrong order in the sequence. Python 编解码器只是让您知道它在序列中以错误的顺序遇到了\\x89 。

(More on continuation bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout ) （更多关于连续字节的信息： http : //en.wikipedia.org/wiki/UTF-8#Codepage_layout ）

Answer 2

The first byte of a .PNG file is 0x89 . .PNG 文件的第一个字节是 0x89 。 Not saying that is your problem, but the .PNG header is specifically designed so that it is NOT accidentally interpreted as text .并不是说这是您的问题，但是 .PNG 标头是专门设计的，因此不会意外地将其解释为 text 。

Why you would have a .csv file that is actually a .png I don't know.我不知道为什么你会有一个实际上是 .png 的 .csv 文件。 But it definitely could happen if someone accidentally renamed the file.但如果有人不小心重命名了文件，这肯定会发生。 On windows 10 every once and a while I accidentally mass-rename files by accident because of their stupid checkbox feature.在 Windows 10 上，由于其愚蠢的复选框功能，我偶尔会不小心对文件进行批量重命名。 Why Microsoft decided desktop machines having identical UI controls to tablets was I good idea... I don't know.为什么 Microsoft 决定台式机具有与平板电脑相同的 UI 控件是我的好主意……我不知道。

Answer 3

I was also getting the similar error when trying to read or upload the following kinds of files:在尝试读取或上传以下类型的文件时，我也遇到了类似的错误：

CSV File CSV文件
JPEG File JPEG文件
PNG File PNG文件
Zip File压缩文件

The best way to avoid error like:避免错误的最佳方法，例如：

'utf-8' codec can't decode byte 0x89 “utf-8”编解码器无法解码字节 0x89
'utf-8' codec can't decode byte 0xff “utf-8”编解码器无法解码字节 0xff

is to read these files as Bytes.是将这些文件作为字节读取。 When you treat them as byte then you need not provide any encoding value here.当您将它们视为字节时，您无需在此处提供任何编码值。 So when you open them you should specify:因此，当您打开它们时，您应该指定：

with open(file_path, 'rb') as file:

Or in your case, the code should be something like:或者在你的情况下，代码应该是这样的：

import csv

with open("c:\\\\Users\\\\Sven\\\\Desktop\\\\relaties 24112014.csv", newline='', 'rb') as f:

reader = csv.reader(f,delimiter=';',quotechar='|')

“utf-8”编解码器无法解码字节 0x89

问题描述

3 个解决方案

解决方案1
5 2014-12-02 06:01:04

解决方案2
3 2020-09-08 17:06:34

解决方案3
1 2021-12-14 08:16:15

“utf-8”编解码器无法解码字节 0x89

问题描述

3 个解决方案

解决方案1 5 2014-12-02 06:01:04

解决方案2 3 2020-09-08 17:06:34

解决方案3 1 2021-12-14 08:16:15

解决方案1
5 2014-12-02 06:01:04

解决方案2
3 2020-09-08 17:06:34

解决方案3
1 2021-12-14 08:16:15