Why did I get UnicodeDecodeError when I read a file which contains Chinese characters?

Question

>>> path = 'name.txt'
>>> content = None
>>> with open(path, 'r') as file:
...     content = file.readlines()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/mnt/lustre/share/miniconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 163: ordinal not in range(128)

When I run this code to read a file which contains Chinese characters, I got an error. The file is saved by using UTF-8. My python version is 3.6.5. But it runs ok in python2.7.

Answer 1

open is using the ASCII codec to try to read the file. The easiest way to fix this is to specify the encoding:

with open(path, 'r', encoding='utf-8') as file:

Your locale should probably specify the preferred encoding as UTF-8, but I think it depends on OS and language settings.

Answer 2

Python 2.7 reads files into byte strings by default.

Python 3.x reads files into Unicode strings by default, so the bytes in the file must be decoded.

The default encoding used varies by operating system, but can be determined by calling locale.getpreferredencoding(False) . This is often utf8 on Linux systems, but Windows systems return a localized ANSI encoding, eg cp1252 for US/Western European Windows versions.

In Python 3, specify the encoding you expect for files so as not to rely on a locale-specific default. For example:

with open(path,'r',encoding='utf8') as f:
    ...

You can do this in Python 2 as well, but use io.open() , which is compatible with Python 3's open() and will read Unicode strings instead of byte strings. io.open() is available in Python 3 as well for portability.

Why did I get UnicodeDecodeError when I read a file which contains Chinese characters?

Question

2 answers

solution1
1 ACCPTED 2020-02-12 06:18:09

solution2
1 2020-02-12 06:21:08

Why did I get UnicodeDecodeError when I read a file which contains Chinese characters?

Question

2 answers

solution1 1 ACCPTED 2020-02-12 06:18:09

solution2 1 2020-02-12 06:21:08

solution1
1 ACCPTED 2020-02-12 06:18:09

solution2
1 2020-02-12 06:21:08