简体   繁体   中英

Why did I get UnicodeDecodeError when I read a file which contains Chinese characters?

>>> path = 'name.txt'
>>> content = None
>>> with open(path, 'r') as file:
...     content = file.readlines()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/mnt/lustre/share/miniconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 163: ordinal not in range(128)

When I run this code to read a file which contains Chinese characters, I got an error. The file is saved by using UTF-8. My python version is 3.6.5. But it runs ok in python2.7.

open is using the ASCII codec to try to read the file. The easiest way to fix this is to specify the encoding:

with open(path, 'r', encoding='utf-8') as file:

Your locale should probably specify the preferred encoding as UTF-8, but I think it depends on OS and language settings.

Python 2.7 reads files into byte strings by default.

Python 3.x reads files into Unicode strings by default, so the bytes in the file must be decoded.

The default encoding used varies by operating system, but can be determined by calling locale.getpreferredencoding(False) . This is often utf8 on Linux systems, but Windows systems return a localized ANSI encoding, eg cp1252 for US/Western European Windows versions.

In Python 3, specify the encoding you expect for files so as not to rely on a locale-specific default. For example:

with open(path,'r',encoding='utf8') as f:
    ...

You can do this in Python 2 as well, but use io.open() , which is compatible with Python 3's open() and will read Unicode strings instead of byte strings. io.open() is available in Python 3 as well for portability.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM