获取字符串的unicode字符

Question

I'm getting a string from a qt widget, and I'm trying to convert the non ascii characters (eg. €) into hex unicode characters (eg. x20ac) 我从qt小部件获取字符串，并且试图将非ascii字符（例如€）转换为十六进制unicode字符（例如x20ac）

Currently I'm doing to see the unicode character is this: 目前，我正在查看Unicode字符是否为：

currentText = self.rich_text_edit.toPlainText() # this string is the € symbol
print("unicode char is: {0}".format(unicode_text))

This provides me with the error: 这为我提供了错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\€' in position 0: ordinal not in range(128) UnicodeEncodeError：'ascii'编解码器无法在位置0编码字符u'\\ u20ac'：序数不在范围内（128）

That's actually what I want, right there, the 20ac. 实际上，这就是我想要的20ac。

How do I get at that? 我该怎么办？

If I do this: 如果我这样做：

unicode_text = str(unicode_text).encode('string_escape')
print unicode_text #returns \xe2\x82\xac

It returns 3 characters, all of them wrong, I'm going round in circles :) 它返回3个字符，所有的字符都是错误的，我正在绕圈:)

I know it's a fairly basic question, but I've never had to worry about unicode before. 我知道这是一个非常基本的问题，但是我之前从未担心过unicode。

Many thanks in advance, Ian 提前非常感谢，伊恩

Answer 1

Use ord and hex : 使用ord和hex ：

>>> hex(ord(u"€"))
 '0x20ac'

Answer 2

\\xe2\\x82\\xac is the UTF-8 encoding of Unicode \\x20ac . \\xe2\\x82\\xac是Unicode \\x20ac的UTF-8编码。

Think of it as follows, Unicode is a 1 to 1 mapping between an integer number and a character similar to ASCII, except Unicode goes much much higher in its number of integer to character mappings. 可以这样认为：Unicode是整数与类似于ASCII的字符之间的一对一映射，不同之处在于Unicode的整数到字符映射数要高得多。

Your € symbol has a integer value of 8364 (or \\x20ac in hex), which is far too big to fit into an 8-bit value of 256 - and so \\x20ac is broken down into 3 individual bytes of \\xe2\\x82\\xac . 您的€符号的整数值为8364 （或\\x20ac以十六进制表示），该值太大而无法容纳8位值\\x20ac因此\\x20ac被分解为3个单独的字节\\xe2\\x82\\xac 。 This is a very high level overview, but I'd really recommend you take a look at this excellent explanation from Scott Hanselman: 这是一个非常高级的概述，但是我真的建议您看一下Scott Hanselman的出色解释：

Why the #AskObama Tweet was Garbled on Screen. 为什么#AskObama Tweet在屏幕上显示乱码。

As for your question, you can simply do 至于你的问题，你可以简单地做

>>> print "unicode code point is: {0}".format(hex(ord(unicode_text)))
unicode code point is: 0x20ac

获取字符串的unicode字符

问题描述

2 个解决方案

解决方案1
4 2014-06-23 14:10:19

解决方案2
4 已采纳 2014-06-23 14:25:33

获取字符串的unicode字符

问题描述

2 个解决方案

解决方案1 4 2014-06-23 14:10:19

解决方案2 4 已采纳 2014-06-23 14:25:33

解决方案1
4 2014-06-23 14:10:19

解决方案2
4 已采纳 2014-06-23 14:25:33