[英]How to convert unicode to its original character in Python
I first tried typing in a Unicode character, encode it in UTF-8, and decode it back. 我首先尝试输入Unicode字符,并以UTF-8编码,然后再解码回去。 Python happily gives back the original character.
Python会愉快地返回原始字符。 I took a look at the encoded string, it is
b'\\xe6\\x88\\x91'
. 我看了一下编码后的字符串,它是
b'\\xe6\\x88\\x91'
。 I don't understand what this is, it looks like 3 hex numbers. 我不明白这是什么,它看起来像3个十六进制数字。
Then I did some research and I found that the CJK set starts from 4E00, so now I want Python to show me what this character looks like. 然后我做了一些研究,发现CJK集从4E00开始,所以现在我想让Python向我展示这个字符的样子。 How do I do that?
我怎么做? Do I need to convert 4E00 to the form of something like the one above?
我是否需要将4E00转换为上述形式?
You'll need to decode it using the UTF-8 encoding: 您需要使用UTF-8编码对其进行解码:
>>> print(b'\xe6\x88\x91'.decode('UTF-8'))
我
By decoding it you're turning the bytes (which is what b'...'
is) into a Unicode string and that's how you can display / use the text. 通过解码,您将字节(即
b'...'
)转换为Unicode字符串,这就是显示/使用文本的方式。
The text b'\\xe6\\x88\\x91'
is the representation of the bytes that are the utf-8 encoding of the unicode codepoint \我
which is the character 我. 文本
b'\\xe6\\x88\\x91'
是字节的表示形式 ,它们是Unicode码点\我
的utf-8编码,该字符是我。 So there is no need in converting something, other than to a unicode string with .decode('utf-8')
. 因此,除了使用
.decode('utf-8')
转换为unicode字符串外,无需进行任何转换。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.