如何在python中将unicode转换为其原始字符

Question

I first tried typing in a Unicode character, encode it in UTF-8, and decode it back. 我首先尝试输入Unicode字符，并以UTF-8编码，然后再解码回去。 Python happily gives back the original character. Python会愉快地返回原始字符。 I took a look at the encoded string, it is b'\\xe6\\x88\\x91' . 我看了一下编码后的字符串，它是b'\\xe6\\x88\\x91' 。 I don't understand what this is, it looks like 3 hex numbers. 我不明白这是什么，它看起来像3个十六进制数字。

Then I did some research and I found that the CJK set starts from 4E00, so now I want Python to show me what this character looks like. 然后我做了一些研究，发现CJK集从4E00开始，所以现在我想让Python向我展示这个字符的样子。 How do I do that? 我怎么做？ Do I need to convert 4E00 to the form of something like the one above? 我是否需要将4E00转换为上述形式？

Answer 1

You'll need to decode it using the UTF-8 encoding: 您需要使用UTF-8编码对其进行解码：

>>> print(b'\xe6\x88\x91'.decode('UTF-8'))
我

By decoding it you're turning the bytes (which is what b'...' is) into a Unicode string and that's how you can display / use the text. 通过解码，您将字节（即b'...' ）转换为Unicode字符串，这就是显示/使用文本的方式。

Answer 2

The text b'\\xe6\\x88\\x91' is the representation of the bytes that are the utf-8 encoding of the unicode codepoint \我 which is the character 我. 文本b'\\xe6\\x88\\x91'是字节的表示形式 ，它们是Unicode码点\我的utf-8编码，该字符是我。 So there is no need in converting something, other than to a unicode string with .decode('utf-8') . 因此，除了使用.decode('utf-8')转换为unicode字符串外，无需进行任何转换。

如何在python中将unicode转换为其原始字符

问题描述

2 个解决方案

解决方案1
0 2014-11-26 20:09:02

解决方案2
0 已采纳 2014-11-26 20:10:30

如何在python中将unicode转换为其原始字符

问题描述

2 个解决方案

解决方案1 0 2014-11-26 20:09:02

解决方案2 0 已采纳 2014-11-26 20:10:30

解决方案1
0 2014-11-26 20:09:02

解决方案2
0 已采纳 2014-11-26 20:10:30