在python3中正确使用unicode字符-强制utf-8编码

Question

I'm going crazy here. 我要疯了。 The internet and this SO question tell me that in python 3.x, the default encoding is UTF-8. 互联网和SO问题告诉我，在python 3.x中，默认编码为UTF-8。 In addition to that, my system's default encoding is UTF-8. 除此之外，我系统的默认编码是UTF-8。 In addition to that, I have # -*- coding: utf-8 -*- at the top of my python 3.5 file. 除此之外，我的python 3.5文件顶部还有# -*- coding: utf-8 -*- 。

Still, python is using ascii: 仍然，python正在使用ascii：

# -*- coding: utf-8 -*-
mystring = "Ⓐ"
print(mystring)

Greets me with: 问候我：

SyntaxError: 'ascii' codec can't decode byte 0xe2 in position 7: ordinal not in range(128)

I've also tried this: print(mystring.encode("utf-8")) and .decode("utf-8") - Same thing. 我也尝试过： print(mystring.encode("utf-8"))和.decode("utf-8") -一样。

What am I missing here? 我在这里想念什么？ How do I force python to stop using ascii encoding? 如何强制python停止使用ascii编码？

Edit: I know that it seems weird to complain about position 7 with a one character string, but this is my actual MCVE and the exact output I'm getting. 编辑：我知道用一个字符串抱怨position 7看起来很奇怪，但这是我的实际MCVE和我得到的确切输出。 The above is using python shell, the below is in a script. 上面是使用python shell，下面是在脚本中。 Both use python 3.5.2 . 两者都使用python 3.5.2 。

Edit: Since I figured it might be relevant: The string I'm getting comes from an external application and is not hardcoded, so I need a way to get that utf-8 string and save it into a file. 编辑：由于我认为这可能是相关的：我要获取的字符串来自外部应用程序，并且未进行硬编码，因此我需要一种获取utf-8字符串并将其保存到文件中的方法。 The above is just a minimalized and generalized example. 上面只是一个最小化和通用的示例。 Here is my real-life code: 这是我的真实代码：

# the variables being a string that might contain unicode characters
mystring = "username: " + fromuser + " | printname: " + fromname
with open("myfile.txt", "a") as myfile:
  myfile.write(mystring + "\n")

Answer 1

In Python3 all strings are unicode, so the problem you're having is likely due to your locale settings not being correct. 在Python3中，所有字符串都是unicode，因此您遇到的问题很可能是由于您的语言环境设置不正确。 The Python3 interpreter looks to use the locale environment variables and if it cannot find them it emulates basic ASCII Python3解释器希望使用语言环境环境变量，如果找不到它们，它将模拟基本ASCII

From locale.py: 从locale.py：

except ImportError:

    # Locale emulation

    CHAR_MAX = 127
    LC_ALL = 6
    LC_COLLATE = 3
    LC_CTYPE = 0
    LC_MESSAGES = 5
    LC_MONETARY = 4
    LC_NUMERIC = 1
    LC_TIME = 2
    Error = ValueError

Double check the locale on your shell from which you are executing . 仔细检查正在执行的Shell上的语言环境。 Here are a few work arounds you can try to see if they get you working before you go through the task of getting your env setup correctly. 在完成正确设置环境的任务之前，您可以尝试以下解决方法，看看它们是否使您工作。

1) Validate UTF-8 locale or language files are installed (see link above) 1）验证是否已安装UTF-8语言环境或语言文件（请参见上面的链接）

2) Try adding this to the top of your script 2）尝试将其添加到脚本顶部

#!/usr/bin/env LC_ALL=en_US.UTF-8 /usr/local/bin/python3
print('カタカナ')

or 要么

#!/usr/bin/env LANG=en_US.UTF-8 /usr/local/bin/python3
print('カタカナ')

Or export shell variables before executing the Python interpreter 或在执行Python解释器之前导出shell变量

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
python3
>>> print('カタカナ')

Sorry I cannot be more specific, as these settings are platform and OS specific. 抱歉，我不能更具体，因为这些设置是特定于平台和操作系统的。 You can forcefully attempt to set the locale in Python directly using the locale module , but I don't recommend that, and it won't help if they are not installed. 您可以使用locale模块直接尝试在Python中直接设置语言环境，但我不建议这样做，如果未安装它们将无济于事。

Hope that helps. 希望能有所帮助。

Answer 2

What's new in Python 3.0 says: Python 3.0的新功能说：

All text is Unicode; 所有文本均为Unicode； however encoded Unicode is represented as binary data 但是编码的Unicode表示为二进制数据

If you want to try outputting utf-8, here's an example: 如果您想尝试输出utf-8，请参考以下示例：

b'\x41'.decode("utf-8", "strict")

If you'd like to use unicode in a string literal, use the unicode escape and its coded representation. 如果要在字符串文字中使用unicode，请使用unicode转义及其编码表示形式。 For your example: 例如：

print("\u24B6")

在python3中正确使用unicode字符-强制utf-8编码

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-08-14 03:06:12

解决方案2
0 2018-08-13 22:51:57

在python3中正确使用unicode字符-强制utf-8编码

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-08-14 03:06:12

解决方案2 0 2018-08-13 22:51:57

解决方案1
4 已采纳 2018-08-14 03:06:12

解决方案2
0 2018-08-13 22:51:57