Python: how to convert utf-8 code string back to string?

Question

I am using Python and unfortunately my code needs to convert a string that represents the utf-8 code of a string in to the original string, like:

UTF-8 code string that I got from other code:

\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5

I need to convert it back to the original string. How to do that?

Answer 1

I think this is what you want. It isn't UTF-8 byte string (well, technically it is, but only because ASCII is a subset of UTF-8).

>>> s='\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5'
>>> print s.decode('unicode-escape')
欢迎提交微博搜索使用反馈，请直接

FYI, this is UTF-8:

>>> s.decode('unicode-escape').encode('utf8')

'\\xe6\\xac\\xa2\\xe8\\xbf\\x8e\\xe6\\x8f\\x90\\xe4\\xba\\xa4\\xe5\\xbe\\xae\\xe5\\x8d\\x9a\\xe6\\x90\\x9c\\xe7\\xb4\\xa2\\xe4\\xbd\\xbf\\xe7\\x94\\xa8\\xe5\\x8f\\x8d\\xe9\\xa6\\x88\\xef\\xbc\\x8c\\xe8\\xaf\\xb7\\xe7\\x9b\\xb4\\xe6\\x8e\\xa5'

Answer 2

If I understand the question, we have a simple byte string, having Unicode escaping in it, or something like that:

a = '\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5'

In [122]: a
Out[122]: '\\u6b22\\u8fce\\u63d0\\u4ea4\\u5fae\\u535a\\u641c\\u7d22\\u4f7f\\u7528\\u53cd\\u9988\\uff0c\\u8bf7\\u76f4\\u63a5'

So we need to manually parse the unicode values from the string, using the Unicode code points:

\u6b22 => unichr(0x6b22) # 欢

or finally:

print "".join([unichr(int('0x'+a[i+2:i+6], 16)) for i in range(0, len(a), 6)])
欢迎提交微博搜索使用反馈，请直接

Answer 3

Mark Pilgrim had explained this in his book. Take a look

http://www.diveintopython.net/xml_processing/unicode.html

>>> s = u"\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5"

>>> print s.encode("utf-8")

>>> 欢迎提交微博搜索使用反馈，请直接

Python: how to convert utf-8 code string back to string?

Question

3 answers

solution1
17 ACCPTED 2012-07-07 16:43:53

solution2
2 2012-07-07 14:42:27

solution3
-1 2012-07-07 14:33:07

Python: how to convert utf-8 code string back to string?

Question

3 answers

solution1 17 ACCPTED 2012-07-07 16:43:53

solution2 2 2012-07-07 14:42:27

solution3 -1 2012-07-07 14:33:07

solution1
17 ACCPTED 2012-07-07 16:43:53

solution2
2 2012-07-07 14:42:27

solution3
-1 2012-07-07 14:33:07