在 Python 3 中读取文件时编码错误？

Question

When I read a file in python and print it to the screen, it does not read certain characters properly, however, those same characters hard coded into a variable print just fine.当我在 python 中读取文件并将其打印到屏幕上时，它无法正确读取某些字符，但是，将这些相同的字符硬编码到变量中打印就好了。 Here is an example where "test.html" contains the text "Hallå":这是一个示例，其中“test.html”包含文本“Hallå”：

with open('test.html','r') as file:
    Str = file.read()
print(Str)
Str = "Hallå"
print(Str)

This generates the following output:这将生成以下输出：

hallÃ¥
Hallå

My guess is that there is something wrong with how the data in the file is being interpreted when it is read into Python, however I am uncertain of what it is since Python 3.8.5 already uses UTF-8 encoding by default.我的猜测是文件中的数据在读入 Python 时的解释方式有问题，但是我不确定它是什么，因为 Python 3.8.5 默认情况下已经使用 UTF-8 编码。

Answer 1

Function open does not use UTF-8 by default.功能open默认情况下不使用UTF-8。 As t he documentation says:正如文档所说：

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.在文本模式下，如果未指定编码，则使用的编码取决于平台： locale.getpreferredencoding(False)以获取当前区域设置编码。

So, it depends, and to be certain, you have to specify the encoding yourself.因此，这取决于，并且可以肯定的是，您必须自己指定编码。 If the file is saved in UTF-8, you should do this:如果文件以 UTF-8 格式保存，您应该这样做：

with open('test.html', 'r', encoding='utf-8') as file:

On the other hand, it is not clear whether the file is or is not saved in UTF-8 encoding.另一方面，不清楚文件是否以 UTF-8 编码保存。 If it is not, you'll have to choose a different one.如果不是，您将不得不选择另一个。

在 Python 3 中读取文件时编码错误？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-23 18:37:46

在 Python 3 中读取文件时编码错误？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-23 18:37:46

解决方案1
2 已采纳 2020-10-23 18:37:46