[英]Difference between u'string' and unicode(string)
This is a sample program i made: 这是我制作的示例程序:
>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212
why do i get \ሒ
instead of ሒ
when i print unicode('\ሒ')
? 当我print unicode('\ሒ')
时,为什么我会得到\ሒ
而不是ሒ
?
I'm making a program to store data and not print it, so how do i store ሒ
instead of \ሒ
? 我正在制作一个程序来存储数据而不是打印它,所以我如何存储ሒ
而不是\ሒ
? Now obviously i can't do something like: 现在显然我做不了类似的事情:
x = u''+unicode('\u1212')
interestingly even if i do that, here's what i get: 有趣的是,即使我这样做,这是我得到的:
\u1212
another fact that i think is worth mentioning : 我认为值得一提的另一个事实是:
>>> u'\u1212' == unicode('\u1212')
False
What do i do to store ሒ
or some other character like that instead of \\uxxxx
? 我该怎么做才能存储ሒ
或其他类似的字符而不是\\uxxxx
?
'\ሒ'
is an ASCII string with 6 characters: \\
, u
, 1
, 2
, 1
, and 2
. '\ሒ'
是一个ASCII字符串,6个字符: \\
, u
, 1
, 2
, 1
,和2
。
unicode('\ሒ')
is a Unicode string with 6 characters: \\
, u
, 1
, 2
, 1
, and 2
unicode('\ሒ')
是Unicode字符串与6个字符: \\
, u
, 1
, 2
, 1
,和2
u'\ሒ'
is a Unicode string with one character: ሒ
. u'\ሒ'
是一个带有一个字符的Unicode字符串: ሒ
。
You should use Unicode strings all around, if that's what you want. 你应该使用Unicode字符串,如果这是你想要的。
u'\u1212'
If for some reason you need to convert '\ሒ'
to u'\ሒ'
, use 如果由于某种原因你需要将'\ሒ'
转换为u'\ሒ'
,请使用
'\u1212'.decode('unicode-escape')
(Note that in Python 3, strings are always Unicode.) (请注意,在Python 3中,字符串始终是Unicode。)
This is just a misunderstanding. 这只是一个误解。
This is a unicode string: x = u'\ሒ'
这是一个unicode字符串: x = u'\ሒ'
When you call print x
it is will print its character ( ሒ
) as shown. 当您调用print x
,它将打印其字符( ሒ
),如图所示。 If you just call x it will show the repr
esntation of it: 如果你只需要调用X,它会显示在repr
的这esntation:
u'\u1212'
All is well with the world. 一切都与世隔绝。
This is an ascii string: y = "\ሒ"
这是一个ascii字符串: y = "\ሒ"
When you call print y
it is will print its value ( \ሒ
) as shown. 当您调用print y
,它将打印其值( \ሒ
),如图所示。 If you just call x it will show the repr
esntation of it: 如果你只需要调用X,它会显示在repr
的这esntation:
'\\udfgdfg'
Notice the double slashes ( \\\\
) that indicate the slash is being escaped. 请注意指示斜杠正在转义的双斜杠( \\\\
)。
So, lets look at the following function call: print unicode('\ሒ')
那么,让我们看看下面的函数调用: print unicode('\ሒ')
This is a function call, and we can replace the string with a variable, so we'll use the equivilent: 这是一个函数调用,我们可以用变量替换字符串,所以我们将使用等效函数:
y = "\u1212"
print unicode(x)
But as in the second exacmple above, y
is an ascii string that is being managed internally as '\\udfgdfg', its not a unicode string at all. 但正如上面的第二个例子中, y
是一个ascii字符串,内部管理为'\\ udfgdfg',它根本不是unicode字符串。 So the unicode representation of '\\\\udfgdfg'
is exactly the same. 所以'\\\\udfgdfg'
的unicode表示完全相同。 Thus why its not behaving correctly. 这就是为什么它表现不正常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.