[英]I keep getting UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte
I'm trying to read monthly csv file but for some reason, I keep getting this error.我正在尝试阅读每月 csv 文件,但由于某种原因,我不断收到此错误。
This is my code below.这是我下面的代码。
df = pd.DataFrame()
for file in os.listdir("Performance_Data"):
if file.endswith(".csv"):
df = pd.concat([df , pd.read_csv(os.path.join("Performance_Data", file))], axis=0 )
df.head()
What do I do?我该怎么办?
Pandas assumes by default that your file is encoded in UTF-8. Pandas 默认假定您的文件以 UTF-8 编码。 Your file is encoded in Windows-1252.
您的文件在 Windows-1252 中编码。 You can tell Pandas to use this encoding by
您可以通过以下方式告诉 Pandas 使用此编码
pd.read_csv(os.path.join("Performance_Data", file), encoding='cp1252')
Detecting the encoding of a file automatically is a bit tricky, but you can use a package called "chardet".自动检测文件的编码有点棘手,但您可以使用名为“chardet”的 package。 For your code, it could look like this:
对于您的代码,它可能如下所示:
import os
import chardet
import pandas as pd
df = pd.DataFrame()
for file in os.listdir("Performance_Data"):
if file.endswith(".csv"):
with open(file, "rb") as fp:
encoding = chardet.detect(fp.read())["encoding"]
df = pd.concat(
[
df,
pd.read_csv(os.path.join("Performance_Data", file), encoding=encoding),
],
axis=0,
)
df.head()
References参考
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.