简体   繁体   English

为什么 Python 不能识别我的 utf-8 编码源文件?

[英]Why doesn't Python recognize my utf-8 encoded source file?

Here is a little tmp.py with a non ASCII character:这是一个带有非 ASCII 字符的小 tmp.py:

if __name__ == "__main__":
    s = 'ß'
    print(s)

Running it I get the following error:运行它我收到以下错误:

Traceback (most recent call last):
  File ".\tmp.py", line 3, in <module>
    print(s)
  File "C:\Python32\lib\encodings\cp866.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

The Python docs says : Python 文档

By default, Python source files are treated as encoded in UTF-8...默认情况下,Python 源文件被视为以 UTF-8 编码...

My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious).我检查编码的方法是使用 Firefox(也许有人会提出更明显的建议)。 I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ß symbol).我在 Firefox 中打开 tmp.py,如果我选择 View->Character Encoding->Unicode (UTF-8) 它看起来没问题,这就是它在这个问题上面的样子(带有 ß 符号)。

If I put:如果我把:

# -*- encoding: utf-8 -*-

as the first string in tmp.py it does not change anything—the error persists.作为 tmp.py 中的第一个字符串,它不会改变任何东西——错误仍然存​​在。

Could someone help me to figure out what am I doing wrong?有人可以帮我弄清楚我做错了什么吗?

The encoding your terminal is using doesn't support that character:您的终端使用的编码不支持该字符:

>>> '\xdf'.encode('cp866')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

Python is handling it just fine, it's your output encoding that cannot handle it. Python 处理得很好,是您的输出编码无法处理它。

You can try using chcp 65001 in the Windows console to switch your codepage;您可以尝试在 Windows 控制台中使用chcp 65001来切换您的代码页; chcp is a windows command line command to change code pages. chcp是用于更改代码页的 Windows 命令行命令。

Mine, on OS X (using UTF-8) can handle it just fine:我的,在 OS X 上(使用 UTF-8)可以很好地处理它:

>>> print('\xdf')
ß

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM