为什么在读取包含汉字的文件时会出现 UnicodeDecodeError？

Question

>>> path = 'name.txt'
>>> content = None
>>> with open(path, 'r') as file:
...     content = file.readlines()
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/mnt/lustre/share/miniconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 163: ordinal not in range(128)

当我运行此代码读取包含中文字符的文件时，出现错误。 该文件使用 UTF-8 保存。 我的 python 版本是 3.6.5。 但它在python2.7中运行正常。

Answer 1

open正在使用 ASCII 编解码器尝试读取文件。 解决此问题的最简单方法是指定编码：

with open(path, 'r', encoding='utf-8') as file:

您的语言环境可能应该将首选编码指定为 UTF-8，但我认为这取决于操作系统和语言设置。

Answer 2

默认情况下，Python 2.7 将文件读入字节字符串。

Python 3.x 默认将文件读入 Unicode 字符串，因此必须对文件中的字节进行解码。

使用的默认编码因操作系统而异，但可以通过调用locale.getpreferredencoding(False)来确定。 这在 Linux 系统上通常是utf8 ，但 Windows 系统返回本地化的 ANSI 编码，例如美国/西欧 Windows 版本的cp1252 。

在 Python 3 中，指定您期望的文件编码，以免依赖特定于语言环境的默认值。 例如：

with open(path,'r',encoding='utf8') as f:
    ...

您也可以在 Python 2 中执行此操作，但使用io.open() ，它与 Python 3 的open()兼容，并且将读取 Unicode 字符串而不是字节字符串。 io.open()在 Python 3 中也可用，以实现可移植性。

为什么在读取包含汉字的文件时会出现 UnicodeDecodeError？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-02-12 06:18:09

解决方案2
1 2020-02-12 06:21:08

为什么在读取包含汉字的文件时会出现 UnicodeDecodeError？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-02-12 06:18:09

解决方案2 1 2020-02-12 06:21:08

解决方案1
1 已采纳 2020-02-12 06:18:09

解决方案2
1 2020-02-12 06:21:08