“utf-8”编解码器无法解码位置 2912 中的字节 0xd5：在 Python 中读取 csv 文件时出现无效的连续字节错误

Question

I am cycling through the rows of a csv file, but come across this error when looping through the rows: 'utf-8' codec can't decode byte 0xd5 in position 2912: invalid continuation byte我正在循环浏览 csv 文件的行，但在循环浏览行时遇到此错误： 'utf-8' codec can't decode byte 0xd5 in position 2912: invalid continuation byte

I'm just trying to get the row count for the file with this function:我只是想用这个函数获取文件的行数：

def count_lines(filename):
    row_stored = ""
    try:
        with open(filename) as csvfile:
            data_reader = csv.reader(csvfile)
            next(data_reader)
            count = 0
            for index, row in enumerate(data_reader):
                if index == 1220119:
                    print(row)
                row_stored = row
                count += 1
            return count
    except Exception as e:
        print(f'There was a problem with your request: {e}\n', row_stored)
        return False

The row above the erroring row looks like this:错误行上方的行如下所示：

['817949019495', 'QMMZN1300568', '4/28/2017', 'Digital Revenue', 'Track', 'Download Europe', 'GB', 'Amazon International - UK', '', '2', '1.2126506333579932', '109926407', '2/28/2017']

And the row that throws the error looks like this:引发错误的行如下所示：

['817949019495', 'QMMZN1300568', '4/28/2017', 'Digital Revenue', 'Track', 'Download Europe', 'GB', 'Amazon International - UK', '', '2', '1.2126506333579932', '109926407', '2/28/2017']

I don't see any differences in the two.我看不出两者有什么区别。 Is there something with the formatting of this particular row that I'm not seeing?这个特定行的格式有什么我没有看到的吗？

Note: This csv file is 3.17 GB.注意：此 csv 文件为 3.17 GB。 Don't know if that's a contributing factor不知道这是否是一个促成因素

Answer 1

更改编码解决了这个问题

with open(filename, encoding="ISO-8859-1") as csvfile:

Answer 2

Here's evidence that it the encoding worked.这是编码有效的证据。 Still not sure what caused the error, but the data remained unaltered.仍然不确定是什么导致了错误，但数据保持不变。

“utf-8”编解码器无法解码位置 2912 中的字节 0xd5：在 Python 中读取 csv 文件时出现无效的连续字节错误

问题描述

1 个解决方案

解决方案1
0 2021-11-15 15:52:03

解决方案2
0 2021-11-16 00:03:04

“utf-8”编解码器无法解码位置 2912 中的字节 0xd5：在 Python 中读取 csv 文件时出现无效的连续字节错误

问题描述

1 个解决方案

解决方案1 0 2021-11-15 15:52:03

解决方案2 0 2021-11-16 00:03:04

解决方案1
0 2021-11-15 15:52:03

解决方案2
0 2021-11-16 00:03:04