在 python 中解码 utf-8

Question

I have an expression like this that produces the list of bytes of the utf-8 representation.我有一个这样的表达式，它产生 utf-8 表示的字节列表。

list(chr(number).encode("utf-8"))

But how to do this in reverse?但是如何反过来呢？

Say, I have 2 bytes [292, 200] as a list, how can I decode them into a symbol?说，我有 2 个字节 [292, 200] 作为列表，如何将它们解码为符号？

Answer 1

You can call bytes on a list of integers in the range 0..255.您可以在 0..255 范围内的整数列表中调用bytes 。

So your example reverses like this:因此，您的示例反转如下：

>>> bytes([195, 136]).decode('utf8')
'È'

If you want the codepoint, wrap it in ord() :如果您想要代码点，请将其包装在ord()中：

>>> ord(bytes([195, 136]).decode('utf8'))
200

Note: the last step only works if the byte sequence corresponds to a single Unicode character (codepoint).注意：仅当字节序列对应于单个 Unicode 字符（代码点）时，最后一步才有效。

Answer 2

You have to remember that char only stores 8 bits: -128 to 127. So if 'number' is bigger than char limits it won't work.您必须记住，char 只存储 8 位：-128 到 127。因此，如果“数字”大于 char 限制，它将不起作用。

 number = 127 print(f"number: {number}") li = list(chr(number).encode("utf-8")) print(f"List of byte: {li}") dec = int.from_bytes(li, byteorder='big') print(f"Type dec: {type(dec)}") print(f"Value dec: {dec}")

 number = 128 print(f"number: {number}") li = list(chr(number).encode("utf-8")) print(f"List of byte: {li}") dec = int.from_bytes(li, byteorder='big') print(f"Type dec: {type(dec)}") print(f"Value dec: {dec}")

Take a look at python documentation for converting values查看python 文档以转换值

在 python 中解码 utf-8

问题描述

2 个解决方案

解决方案1
2 2020-05-09 12:02:03

解决方案2
1 已采纳 2020-05-09 12:41:52

在 python 中解码 utf-8

问题描述

2 个解决方案

解决方案1 2 2020-05-09 12:02:03

解决方案2 1 已采纳 2020-05-09 12:41:52

解决方案1
2 2020-05-09 12:02:03

解决方案2
1 已采纳 2020-05-09 12:41:52