[英]Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'
I am using Python 2.7.3. 我使用的是Python 2.7.3。 Can anybody explain the difference between the literals:
任何人都可以解释文字之间的区别:
'\u0391'
and: 和:
u'\u0391'
and the different way they are echoed in the REPL below (especially the extra slash added to a1): 以及它们在下面的REPL中回显的不同方式(特别是添加到a1的额外斜杠):
>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>>
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>>
You can only use unicode escapes ( \ꯍ
) in a unicode string literal. 您只能在unicode字符串文字中使用unicode转义
\ꯍ
( \ꯍ
)。 They have no meaning in a byte string. 它们在字节字符串中没有意义。 A Python 2 Unicode literal (
u'some text'
) is a different type of Python object from a python byte string ( 'some text'
). Python 2 Unicode文字(
u'some text'
)是python字节字符串( 'some text'
)中不同类型的Python对象。
It's like using \\t
versus \\T
; 这就像使用
\\t
对\\T
; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters). 前者在python文字中有意义(它被解释为制表符),后者只是反斜杠和大写字母T(两个字符)。
To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO ; 要帮助理解Unicode和字节字符串之间的区别,请阅读Python Unicode HOWTO ; I can also recommend the Joel Spolsky on Unicode article .
我也可以在Unicode文章上推荐Joel Spolsky 。
Note: in Python 3, the same differences apply, but 'some text'
is a Unicode string literal, and b'some text'
is the bytestring syntax. 注意:在Python 3中,同样的差异适用,但
'some text'
是Unicode字符串文字,而b'some text'
是b'some text'
语法。
As opposed to C, in Python a string can be enclosed in simple quotes ( '
) as well as double quotes ( "
) -- leaving aside the triple-double quotes """
. 与C相反,在Python中,字符串可以用简单的引号(
'
)和双引号( "
)括起来 - 不包括三双引号"""
。
Thus, '\Α'
is only a string containing the letters \\
, u
, 0
, 3
, 9
and 1
. 因此,
'\Α'
是只包含字母串\\
, u
, 0
, 3
, 9
和1
。 When pretty printing this string, the \\
is escaped via another \\
. 当漂亮地打印这个字符串时,
\\
会通过另一个\\
来转义。
On the contrary, having a u
in front makes the string to be considered Unicode and all escapes are evaluated. 相反,在前面使用
u
使得字符串被视为Unicode并且评估所有转义。 Thus, u'\Α'
is interpreted as "the Unicode string containing codepoint 0391
" which is different from the above. 因此,
u'\Α'
被解释为“包含代码点0391
的Unicode字符串”,它与上述不同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.