UnicodeDecodeError：“utf-8”编解码器无法解码 position 35 中的字节 0x96：无效的起始字节

Question

I am new to Python, I am trying to read csv file using below script.我是 Python 的新手，我正在尝试使用以下脚本读取 csv 文件。

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8')

But, getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte", Please help me to know issue here, I used encoding in script thought it will resolve error.但是，出现错误“UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte”，请帮助我了解这里的问题，我在脚本中使用了编码认为它会解决错误。

Answer 1

This happens because you chose the wrong encoding.发生这种情况是因为您选择了错误的编码。

Since you are working on a Windows machine, just replacing由于您正在使用 Windows 机器，只需更换

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='utf-8')

with和

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='cp1252')

should solve the problem.应该解决问题。

Answer 2

Use this solution it will strip out (ignore) the characters and return the string without them.使用此解决方案，它将删除（忽略）字符并返回没有它们的字符串。 Only use this if your need is to strip them not convert them.仅当您需要剥离它们而不是转换它们时才使用它。

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters.使用errors='ignore'你只会丢失一些字符。 but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server.但是如果您不关心它们，因为它们似乎是源自连接到我的套接字服务器的客户端的错误格式和编程的额外字符。 Then its a easy direct solution.然后它是一个简单的直接解决方案。 reference 参考

Answer 3

Try using :尝试使用：

pd.read_csv("Your filename", encoding="ISO-8859-1")

The code that I parsed from some website was converted in this encoding instead of default UTF-8 encoding which is standard.我从某个网站解析的代码被转换为这种编码，而不是标准的默认 UTF-8 编码。

Answer 4

以下对我来说非常有效：

encoding = 'latin1'

Answer 5

Its an old question but shows up while searching for solutions to this error.这是一个老问题，但在搜索此错误的解决方案时出现。 So I thought to answer for all who still stumble on this thread.所以我想为所有仍然偶然发现这个线程的人回答。 The encoding for the file can be checked before passing the correct value for the encoding argument.在传递正确的编码参数值之前，可以检查文件的编码。 To get the encoding, a simple option in Windows is to open the file in Notepad++ and look at the encoding.要获得编码，Windows 中的一个简单选项是在 Notepad++ 中打开文件并查看编码。 The correct value for the encoding argument can then be found in the python documentation .然后可以在python 文档中找到 encoding 参数的正确值。 Look at this question and the answers on stackoverflow for more details on different possibilities to get the file encoding.查看这个问题和 stackoverflow 上的答案，了解有关获取文件编码的不同可能性的更多详细信息。

Answer 6

使用下面的代码对我有用：

with open(keeniz_dir + '/world_cities.csv',  'r', encoding='latin1') as input:

Answer 7

Don't pass encoding option unless you are sure about file encoding.除非您确定文件编码，否则不要传递编码选项。 Default value encoding=None passes errors="replace" to open() function called.默认值 encoding=None 将 errors="replace" 传递给调用的 open() 函数。 Characters with encoding errors will be substituted with replacements, you can then figure out correct encoding or just use the resulting Dataframe.编码错误的字符将被替换，然后您可以找出正确的编码或仅使用生成的 Dataframe。 If wrong encoding is provided pd will pass errors="strict" to open() and get ValueError if encoding is incorrect.如果提供了错误的编码，pd 会将 errors="strict" 传递给 open() 并在编码不正确时获取 ValueError。

Answer 8

df = pd.read_csv( "/content/data.csv",encoding='latin1')

只需添加 ,encoding='latin1' 即可

UnicodeDecodeError：“utf-8”编解码器无法解码 position 35 中的字节 0x96：无效的起始字节

问题描述

8 个解决方案

解决方案1
95 已采纳 2017-08-06 09:00:24

解决方案2
21 2018-02-01 07:27:25

解决方案3
19 2018-03-07 01:59:46

解决方案4
6 2018-10-30 23:29:59

解决方案5
4 2021-08-10 11:51:22

解决方案6
2 2018-11-10 12:55:50

解决方案7
2 2021-06-07 19:47:28

解决方案8
0 2022-04-24 10:01:51

UnicodeDecodeError：“utf-8”编解码器无法解码 position 35 中的字节 0x96：无效的起始字节

问题描述

8 个解决方案

解决方案1 95 已采纳 2017-08-06 09:00:24

解决方案2 21 2018-02-01 07:27:25

解决方案3 19 2018-03-07 01:59:46

解决方案4 6 2018-10-30 23:29:59

解决方案5 4 2021-08-10 11:51:22

解决方案6 2 2018-11-10 12:55:50

解决方案7 2 2021-06-07 19:47:28

解决方案8 0 2022-04-24 10:01:51

解决方案1
95 已采纳 2017-08-06 09:00:24

解决方案2
21 2018-02-01 07:27:25

解决方案3
19 2018-03-07 01:59:46

解决方案4
6 2018-10-30 23:29:59

解决方案5
4 2021-08-10 11:51:22

解决方案6
2 2018-11-10 12:55:50

解决方案7
2 2021-06-07 19:47:28

解决方案8
0 2022-04-24 10:01:51