How do convert unicode escape sequences to unicode characters in a python string

Question

When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\\xf6ld". I want the escape sequence to be returned as string. How to do it in python?

Answer 1

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:

>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'

Another way of achieving this:

>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'

Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:

>>> print name.decode('latin-1')
Christensen Sköld

BTW: when necessary, you can use de "encode" method to turn the unicode into eg a UTF-8 string:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'

Answer 2

给定带有Unicode转义字节的字节串b"\\N{SNOWMAN}" ， b"\\N{SNOWMAN}".decode('unicode-escape)将产生预期的Unicode字符串u'\☃' 。

Answer 3

I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:

>>> u'\xcfa'
u'\xcfa'
>>> print u'\xcfa'
Ïa

How do convert unicode escape sequences to unicode characters in a python string

Question

3 answers

solution1
29 ACCPTED 2009-06-14 06:46:22

solution2
8 2012-08-23 00:36:28

solution3
7 2009-06-13 07:02:20

How do convert unicode escape sequences to unicode characters in a python string

Question

3 answers

solution1 29 ACCPTED 2009-06-14 06:46:22

solution2 8 2012-08-23 00:36:28

solution3 7 2009-06-13 07:02:20

solution1
29 ACCPTED 2009-06-14 06:46:22

solution2
8 2012-08-23 00:36:28

solution3
7 2009-06-13 07:02:20