u'string'和unicode（字符串）之间的区别

Question

This is a sample program i made: 这是我制作的示例程序：

>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212

why do i get \ሒ instead of ሒ when i print unicode('\ሒ') ? 当我print unicode('\ሒ')时，为什么我会得到\ሒ而不是ሒ ？

I'm making a program to store data and not print it, so how do i store ሒ instead of \ሒ ? 我正在制作一个程序来存储数据而不是打印它，所以我如何存储ሒ而不是\ሒ ？ Now obviously i can't do something like: 现在显然我做不了类似的事情：

x = u''+unicode('\u1212')

interestingly even if i do that, here's what i get: 有趣的是，即使我这样做，这是我得到的：

\u1212

another fact that i think is worth mentioning : 我认为值得一提的另一个事实是：

>>> u'\u1212' == unicode('\u1212')
False

What do i do to store ሒ or some other character like that instead of \\uxxxx ? 我该怎么做才能存储ሒ或其他类似的字符而不是\\uxxxx ？

Answer 1

'\ሒ' is an ASCII string with 6 characters: \\ , u , 1 , 2 , 1 , and 2 . '\ሒ'是一个ASCII字符串，6个字符： \\ ， u ， 1 ， 2 ， 1 ，和2 。

unicode('\ሒ') is a Unicode string with 6 characters: \\ , u , 1 , 2 , 1 , and 2 unicode('\ሒ')是Unicode字符串与6个字符： \\ ， u ， 1 ， 2 ， 1 ，和2

u'\ሒ' is a Unicode string with one character: ሒ . u'\ሒ'是一个带有一个字符的Unicode字符串： ሒ 。

You should use Unicode strings all around, if that's what you want. 你应该使用Unicode字符串，如果这是你想要的。

u'\u1212'

If for some reason you need to convert '\ሒ' to u'\ሒ' , use 如果由于某种原因你需要将'\ሒ'转换为u'\ሒ' ，请使用

'\u1212'.decode('unicode-escape')

(Note that in Python 3, strings are always Unicode.) （请注意，在Python 3中，字符串始终是Unicode。）

Answer 2

This is just a misunderstanding. 这只是一个误解。

This is a unicode string: x = u'\ሒ' 这是一个unicode字符串： x = u'\ሒ'

When you call print x it is will print its character ( ሒ ) as shown. 当您调用print x ，它将打印其字符（ ሒ ），如图所示。 If you just call x it will show the repr esntation of it: 如果你只需要调用X，它会显示在repr的这esntation：

u'\u1212'

All is well with the world. 一切都与世隔绝。

This is an ascii string: y = "\ሒ" 这是一个ascii字符串： y = "\ሒ"

When you call print y it is will print its value ( \ሒ ) as shown. 当您调用print y ，它将打印其值（ \ሒ ），如图所示。 If you just call x it will show the repr esntation of it: 如果你只需要调用X，它会显示在repr的这esntation：

'\\udfgdfg'

Notice the double slashes ( \\\\ ) that indicate the slash is being escaped. 请注意指示斜杠正在转义的双斜杠（ \\\\ ）。

So, lets look at the following function call: print unicode('\ሒ') 那么，让我们看看下面的函数调用： print unicode('\ሒ')

This is a function call, and we can replace the string with a variable, so we'll use the equivilent: 这是一个函数调用，我们可以用变量替换字符串，所以我们将使用等效函数：

y = "\u1212"
print unicode(x)

But as in the second exacmple above, y is an ascii string that is being managed internally as '\\udfgdfg', its not a unicode string at all. 但正如上面的第二个例子中， y是一个ascii字符串，内部管理为'\\ udfgdfg'，它根本不是unicode字符串。 So the unicode representation of '\\\\udfgdfg' is exactly the same. 所以'\\\\udfgdfg'的unicode表示完全相同。 Thus why its not behaving correctly. 这就是为什么它表现不正常。

u'string'和unicode（字符串）之间的区别

问题描述

2 个解决方案

解决方案1
6 已采纳 2013-12-11 05:08:28

解决方案2
1

u'string'和unicode（字符串）之间的区别

问题描述

2 个解决方案

解决方案1 6 已采纳 2013-12-11 05:08:28

解决方案2 1

解决方案1
6 已采纳 2013-12-11 05:08:28

解决方案2
1