简体   繁体   中英

invalid continuation byte when reading file

Here is my Code line:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames)

results in the error:

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

I have specified a coder in my code:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames,encoding='utf-8')

But the problem still exists. What should I do then?

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

Here the error message means you should NOT use utf8 encoding.

It might be utf16 , gbk and so on, if you have ever heard them.

If you still got the message like that, after some possible attempts.

I will suggest chardet package.

It is very easy to use.

import chardet
with open("your_file", mode="rb") as f:
    print(chardet.detect(f.read(2000)))

rb means, read it as binary code. 2000 means, the bytes size you wanna detect. Often, the larger you set, the more accurate the results.

chardet - pypi

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM