简体   繁体   English

UnicodeDecodeError:“utf-8”编解码器无法解码 position 35 中的字节 0x96:无效的起始字节

[英]UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte

I am new to Python, I am trying to read csv file using below script.我是 Python 的新手,我正在尝试使用以下脚本读取 csv 文件。

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8')

But, getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte", Please help me to know issue here, I used encoding in script thought it will resolve error.但是,出现错误“UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte”,请帮助我了解这里的问题,我在脚本中使用了编码认为它会解决错误。

This happens because you chose the wrong encoding.发生这种情况是因为您选择了错误的编码。

Since you are working on a Windows machine, just replacing由于您正在使用 Windows 机器,只需更换

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='utf-8') 

with

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='cp1252')

should solve the problem.应该解决问题。

Use this solution it will strip out (ignore) the characters and return the string without them.使用此解决方案,它将删除(忽略)字符并返回没有它们的字符串。 Only use this if your need is to strip them not convert them.仅当您需要剥离它们而不是转换它们时才使用它。

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters.使用errors='ignore'你只会丢失一些字符。 but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server.但是如果您不关心它们,因为它们似乎是源自连接到我的套接字服务器的客户端的错误格式和编程的额外字符。 Then its a easy direct solution.然后它是一个简单的直接解决方案。 reference 参考

Try using :尝试使用:

pd.read_csv("Your filename", encoding="ISO-8859-1")

The code that I parsed from some website was converted in this encoding instead of default UTF-8 encoding which is standard.我从某个网站解析的代码被转换为这种编码,而不是标准的默认 UTF-8 编码。

以下对我来说非常有效:

encoding = 'latin1'

Its an old question but shows up while searching for solutions to this error.这是一个老问题,但在搜索此错误的解决方案时出现。 So I thought to answer for all who still stumble on this thread.所以我想为所有仍然偶然发现这个线程的人回答。 The encoding for the file can be checked before passing the correct value for the encoding argument.在传递正确的编码参数值之前,可以检查文件的编码。 To get the encoding, a simple option in Windows is to open the file in Notepad++ and look at the encoding.要获得编码,Windows 中的一个简单选项是在 Notepad++ 中打开文件并查看编码。 The correct value for the encoding argument can then be found in the python documentation .然后可以在python 文档中找到 encoding 参数的正确值。 Look at this question and the answers on stackoverflow for more details on different possibilities to get the file encoding.查看这个问题和 stackoverflow 上的答案,了解有关获取文件编码的不同可能性的更多详细信息。

使用下面的代码对我有用:

with open(keeniz_dir + '/world_cities.csv',  'r', encoding='latin1') as input:

Don't pass encoding option unless you are sure about file encoding.除非您确定文件编码,否则不要传递编码选项。 Default value encoding=None passes errors="replace" to open() function called.默认值 encoding=None 将 errors="replace" 传递给调用的 open() 函数。 Characters with encoding errors will be substituted with replacements, you can then figure out correct encoding or just use the resulting Dataframe.编码错误的字符将被替换,然后您可以找出正确的编码或仅使用生成的 Dataframe。 If wrong encoding is provided pd will pass errors="strict" to open() and get ValueError if encoding is incorrect.如果提供了错误的编码,pd 会将 errors="strict" 传递给 open() 并在编码不正确时获取 ValueError。

df = pd.read_csv( "/content/data.csv",encoding='latin1')

只需添加 ,encoding='latin1' 即可

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:UnicodeDecodeError:'utf-8'编解码器无法解码位置37的字节0x96:无效的起始字节 - Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 37: invalid start byte 关于“utf-8”编解码器的 UnicodeDecodeError 无法在 Python 中解码字节 0x96 - UnicodeDecodeError regarding 'utf-8' codec can't decode byte 0x96 in Python 为什么我收到SyntaxError:(Unicode错误)“ utf-8”编解码器无法解码位置0的字节0x96:无效的起始字节 - Why am I getting SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0x80:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 1072 中的字节 0x95:起始字节无效 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 1072: invalid start byte UnicodeDecodeError:“utf-8”编解码器无法解码位置 1 的字节 0x8b:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0x99:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 0: invalid start byte UnicodeDecodeError'utf-8'编解码器无法解码位置2893中的字节0x92:无效的起始字节 - UnicodeDecodeError 'utf-8' codec can't decode byte 0x92 in position 2893: invalid start byte UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 1551 中的字节 0x87:起始字节无效 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 1551: invalid start byte Python:UnicodeDecodeError:'utf-8'编解码器无法解码 position 中的字节 0x80 0:无效起始字节 - Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM