我不断收到 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte

Question

I'm trying to read monthly csv file but for some reason, I keep getting this error.我正在尝试阅读每月 csv 文件，但由于某种原因，我不断收到此错误。

This is my code below.这是我下面的代码。

df = pd.DataFrame()
 
for file in os.listdir("Performance_Data"):
    if file.endswith(".csv"):
        df = pd.concat([df , pd.read_csv(os.path.join("Performance_Data", file))], axis=0 )
        
df.head()

What do I do?我该怎么办？

Answer 1

Pandas assumes by default that your file is encoded in UTF-8. Pandas 默认假定您的文件以 UTF-8 编码。 Your file is encoded in Windows-1252.您的文件在 Windows-1252 中编码。 You can tell Pandas to use this encoding by您可以通过以下方式告诉 Pandas 使用此编码

pd.read_csv(os.path.join("Performance_Data", file), encoding='cp1252')

Detecting the encoding of a file automatically is a bit tricky, but you can use a package called "chardet".自动检测文件的编码有点棘手，但您可以使用名为“chardet”的 package。 For your code, it could look like this:对于您的代码，它可能如下所示：

import os

import chardet
import pandas as pd

df = pd.DataFrame()

for file in os.listdir("Performance_Data"):
    if file.endswith(".csv"):
        with open(file, "rb") as fp:
            encoding = chardet.detect(fp.read())["encoding"]
        df = pd.concat(
            [
                df,
                pd.read_csv(os.path.join("Performance_Data", file), encoding=encoding),
            ],
            axis=0,
        )

df.head()

References参考

Pandas read_csv documentation. Pandas read_csv文档。
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 377826: invalid start byte , a relevant earlier question on Stack Overflow. UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 377826: invalid start byte ， Stack Overflow 上一个相关的早期问题。

我不断收到 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte

问题描述

1 个解决方案

解决方案1
0 2021-12-16 14:46:18

我不断收到 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte

问题描述

1 个解决方案

解决方案1 0 2021-12-16 14:46:18

解决方案1
0 2021-12-16 14:46:18