如何处理注释中的特殊字符和python文件中的硬编码字符串？

Question

This question aims at the following two scenarios: 该问题针对以下两种情况：

You want to add a string with special characters to a variable: 您想要将带有特殊字符的字符串添加到变量中：
special_char_string = "äöüáèô"
You want to allow special characters in comments. 您要允许注释中包含特殊字符。
# This a comment with special characters in it: äöà etc.

At the moment I handle this this way: 目前，我是这样处理的：

# -*- encoding: utf-8 -*-
special_char_string = "äöüáèô".decode('utf8')
# This a comment with special characters in it: äöà etc.

Works fine. 工作正常。

Is this the recommended way? 这是推荐的方法吗？ Or is there a better solution for this? 还是对此有更好的解决方案？

Answer 1

Python will check the first or second line for an emacs/vim-like encoding specification. Python将检查第一行或第二行，以获取类似emacs / vim的编码规范。

More precisely, the first or second line must match the regular expression "coding[:=]\\s*([-\\w.]+)". 更准确地说， 第一行或第二行必须匹配正则表达式“ coding [：=] \\ s *（[-\\ w。] +）”。 The first group of this expression is then interpreted as encoding name. 然后将此表达式的第一组解释为编码名称。 If the encoding is unknown to Python, an error is raised during compilation. 如果Python未知编码，则在编译期间会引发错误。

Source: PEP 263 资料来源： PEP 263

(A BOM would also make Python interpret the source as UTF-8. （BOM还可以使Python将源解释为UTF-8。

I would recommend, you use this over .decode('utf8') 我建议您在.decode('utf8')

# -*- encoding: utf-8 -*-
special_char_string = u"äöüáèô"

In any case, special_char_string will then contain a unicode object, no longer a str . 无论如何， special_char_string将包含一个unicode对象，不再是str 。 As you can see, they're both semantically equivalent: 如您所见，它们在语义上都是等效的：

>>> u"äöüáèô" == "äöüáèô".decode('utf8')
True

And the reverse: 相反：

>>> u"äöüáèô".encode('utf8')
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'
>>> "äöüáèô"
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'

There is a technical difference, however: if you use u"something", it will instruct the parser that there is a unicode literal, it should be a bit faster. 但是，存在技术上的差异：如果您使用u“ something”，它将指示解析器存在unicode字面量，它应该快一些。

Answer 2

Yes, this is the recommended way for Python 2.x, see PEP 0263 . 是的，这是Python 2.x的推荐方法，请参阅PEP 0263 。 In Python 3.x and above, the default encoding is UTF-8 and not ASCII, so you don't need this there. 在Python 3.x及更高版本中，默认编码为UTF-8，而不是ASCII，因此您无需在此使用。 See PEP 3120 . 参见PEP 3120 。

如何处理注释中的特殊字符和python文件中的硬编码字符串？

问题描述

2 个解决方案

解决方案1
4 已采纳 2011-06-22 13:00:09

解决方案2
2 2011-06-22 12:01:50

如何处理注释中的特殊字符和python文件中的硬编码字符串？

问题描述

2 个解决方案

解决方案1 4 已采纳 2011-06-22 13:00:09

解决方案2 2 2011-06-22 12:01:50

解决方案1
4 已采纳 2011-06-22 13:00:09

解决方案2
2 2011-06-22 12:01:50