invalid continuation byte when reading file

Question

Here is my Code line:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames)

results in the error:

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

I have specified a coder in my code:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames,encoding='utf-8')

But the problem still exists. What should I do then?

Answer 1

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

Here the error message means you should NOT use utf8 encoding.

It might be utf16 , gbk and so on, if you have ever heard them.

If you still got the message like that, after some possible attempts.

I will suggest chardet package.

It is very easy to use.

import chardet
with open("your_file", mode="rb") as f:
    print(chardet.detect(f.read(2000)))

rb means, read it as binary code. 2000 means, the bytes size you wanna detect. Often, the larger you set, the more accurate the results.