简体   繁体   English

如何在python中将unicode转换为其原始字符

[英]How to convert unicode to its original character in Python

I first tried typing in a Unicode character, encode it in UTF-8, and decode it back. 我首先尝试输入Unicode字符,并以UTF-8编码,然后再解码回去。 Python happily gives back the original character. Python会愉快地返回原始字符。 I took a look at the encoded string, it is b'\\xe6\\x88\\x91' . 我看了一下编码后的字符串,它是b'\\xe6\\x88\\x91' I don't understand what this is, it looks like 3 hex numbers. 我不明白这是什么,它看起来像3个十六进制数字。

Then I did some research and I found that the CJK set starts from 4E00, so now I want Python to show me what this character looks like. 然后我做了一些研究,发现CJK集从4E00开始,所以现在我想让Python向我展示这个字符的样子。 How do I do that? 我怎么做? Do I need to convert 4E00 to the form of something like the one above? 我是否需要将4E00转换为上述形式?

You'll need to decode it using the UTF-8 encoding: 您需要使用UTF-8编码对其进行解码:

>>> print(b'\xe6\x88\x91'.decode('UTF-8'))
我

By decoding it you're turning the bytes (which is what b'...' is) into a Unicode string and that's how you can display / use the text. 通过解码,您将字节(即b'...' )转换为Unicode字符串,这就是显示/使用文本的方式。

The text b'\\xe6\\x88\\x91' is the representation of the bytes that are the utf-8 encoding of the unicode codepoint \我 which is the character 我. 文本b'\\xe6\\x88\\x91'是字节的表示形式 ,它们是Unicode码点\我的utf-8编码,该字符是我。 So there is no need in converting something, other than to a unicode string with .decode('utf-8') . 因此,除了使用.decode('utf-8')转换为unicode字符串外,无需进行任何转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM