简体   繁体   English

如何将unicode转义序列转换为python字符串中的unicode字符

[英]How do convert unicode escape sequences to unicode characters in a python string

When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\\xf6ld". 当我尝试使用“unicode(head.contents [3])”来获取标签的内容时,我得到类似于此的输出:“Christensen Sk \\ xf6ld”。 I want the escape sequence to be returned as string. 我希望转义序列作为字符串返回。 How to do it in python? 如何在python中做到这一点?

Assuming Python sees the name as a normal string, you'll first have to decode it to unicode: 假设Python将名称视为普通字符串,您首先必须将其解码为unicode:

>>> name
'Christensen Sk\xf6ld'
>>> unicode(name, 'latin-1')
u'Christensen Sk\xf6ld'

Another way of achieving this: 另一种实现此目的的方法:

>>> name.decode('latin-1')
u'Christensen Sk\xf6ld'

Note the "u" in front of the string, signalling it is uncode. 注意字符串前面的“u”,表示它是uncode。 If you print this, the accented letter is shown properly: 如果您打印它,正确显示重音字母:

>>> print name.decode('latin-1')
Christensen Sköld

BTW: when necessary, you can use de "encode" method to turn the unicode into eg a UTF-8 string: 顺便说一句:必要时,您可以使用de“encode”方法将unicode转换为例如UTF-8字符串:

>>> name.decode('latin-1').encode('utf-8')
'Christensen Sk\xc3\xb6ld'

给定带有Unicode转义字节的字节串b"\\N{SNOWMAN}"b"\\N{SNOWMAN}".decode('unicode-escape)将产生预期的Unicode字符串u'\☃'

I suspect that it's acutally working correctly. 我怀疑它正在正常工作。 By default, Python displays strings in ASCII encoding, since not all terminals support unicode. 默认情况下,Python以ASCII编码显示字符串,因为并非所有终端都支持unicode。 If you actually print the string, though, it should work. 但是,如果你实际打印字符串,它应该工作。 See the following example: 请参阅以下示例:

>>> u'\xcfa'
u'\xcfa'
>>> print u'\xcfa'
Ïa

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM