简体   繁体   English

UnicodeDecodeError:'utf-8'编解码器无法解码 position 0 中的字节 0xff:读取 csv 时 python 中的无效起始字节错误

[英]UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte error in python while reading a csv file

StopWords = pd.read_csv('stopwords.csv',encoding='UTF-8', quotechar='|',names=['StopWords'])

I am trying to read a CSV file that contains Persian language text, and this is the error I get:我正在尝试读取包含波斯语文本的 CSV 文件,这是我得到的错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte UnicodeDecodeError:“utf-8”编解码器无法解码 position 中的字节 0xff 0:无效的起始字节

Without seeing the binary content of the file it is difficult to guess the actual encoding but UTF-8, with or without a BOM (Byte order Marker) cannot start with an 0xFF.在没有看到文件的二进制内容的情况下,很难猜测实际编码,但是 UTF-8,无论有没有 BOM(字节顺序标记)都不能以 0xFF 开头。

If it starts with an 0xFF, then that would suggest that it is probably in Little Endian UTF-16 to UTF-32 which are the only Unicode serialisations that have a byte order marker starting with 0xFF.如果它以 0xFF 开头,那么这表明它可能在 Little Endian UTF-16 到 UTF-32 中,这是唯一具有以 0xFF 开头的字节顺序标记的 Unicode 序列化。

https://en.wikipedia.org/wiki/Byte_order_mark gives some explanation. https://en.wikipedia.org/wiki/Byte_order_mark给出了一些解释。

It is also possible that it is a Persian specific character set.它也可能是波斯语特定的字符集。 National character sets should be avoided if a Unicode option is available, for the generation of your source CSV files.如果 Unicode 选项可用,则应避免使用国家字符集,以生成源 CSV 文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 UnicodeDecodeError when reading CSV file in Pandas with Python “'utf-8' codec can't decode byte 0xff in position 0: invalid start byte” - UnicodeDecodeError when reading CSV file in Pandas with Python “'utf-8' codec can't decode byte 0xff in position 0: invalid start byte” 套接字错误:“UnicodeDecodeError:‘utf-8’编解码器无法解码 position 中的字节 0xff 0:起始字节无效” - Socket error: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" 错误 UnicodeDecodeError:'utf-8' 编解码器无法解码 position 中的字节 0xff 0:起始字节无效 - error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte UnicodeDecodeError:'utf-8'编解码器无法解码位置38的字节0xff:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 38: invalid start byte UnicodeDecodeError:“utf-8”编解码器无法解码位置 162 中的字节 0xff:无效起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 162: invalid start byte UnicodeDecodeError:“ utf-8”编解码器无法解码位置0的字节0xff:无效的起始字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte 如何解决 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte - How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte CSV 到字节到 DF 以绕过 UnicodeDecodeError:'utf-8' 编解码器无法解码 position 中的字节 0xff 0:无效起始字节? - CSV to bytes to DF to bypass UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte? 如何解决 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python - How to solve UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte in python UnicodeDecodeError:'utf-8'编解码器无法解码 position 398 中的字节 0xff:无效的起始字节 || 为所有人预订 python - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM