简体   繁体   English

Python unicode字符串文字::'\ u0391'和u'\ u0391'之间的区别是什么

[英]Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'

I am using Python 2.7.3. 我使用的是Python 2.7.3。 Can anybody explain the difference between the literals: 任何人都可以解释文字之间的区别:

'\u0391'

and: 和:

u'\u0391'

and the different way they are echoed in the REPL below (especially the extra slash added to a1): 以及它们在下面的REPL中回显的不同方式(特别是添加到a1的额外斜杠):

>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>> 
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>> 

You can only use unicode escapes ( \ꯍ ) in a unicode string literal. 您只能在unicode字符串文字中使用unicode转义\ꯍ\ꯍ )。 They have no meaning in a byte string. 它们在字节字符串中没有意义。 A Python 2 Unicode literal ( u'some text' ) is a different type of Python object from a python byte string ( 'some text' ). Python 2 Unicode文字( u'some text' )是python字节字符串( 'some text' )中不同类型的Python对象。

It's like using \\t versus \\T ; 这就像使用\\t\\T ; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters). 前者在python文字中有意义(它被解释为制表符),后者只是反斜杠和大写字母T(两个字符)。

To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO ; 要帮助理解Unicode和字节字符串之间的区别,请阅读Python Unicode HOWTO ; I can also recommend the Joel Spolsky on Unicode article . 我也可以在Unicode文章上推荐Joel Spolsky

Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax. 注意:在Python 3中,同样的差异适用,但'some text'是Unicode字符串文字,而b'some text'b'some text'语法。

As opposed to C, in Python a string can be enclosed in simple quotes ( ' ) as well as double quotes ( " ) -- leaving aside the triple-double quotes """ . 与C相反,在Python中,字符串可以用简单的引号( ' )和双引号( " )括起来 - 不包括三双引号"""

Thus, '\Α' is only a string containing the letters \\ , u , 0 , 3 , 9 and 1 . 因此, '\Α'是只包含字母串\\u0391 When pretty printing this string, the \\ is escaped via another \\ . 当漂亮地打印这个字符串时, \\会通过另一个\\来转义。

On the contrary, having a u in front makes the string to be considered Unicode and all escapes are evaluated. 相反,在前面使用u使得字符串被视为Unicode并且评估所有转义。 Thus, u'\Α' is interpreted as "the Unicode string containing codepoint 0391 " which is different from the above. 因此, u'\Α'被解释为“包含代码点0391的Unicode字符串”,它与上述不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM