简体   繁体   中英

Decoding utf-8 in python

I have an expression like this that produces the list of bytes of the utf-8 representation.

list(chr(number).encode("utf-8"))

But how to do this in reverse?

Say, I have 2 bytes [292, 200] as a list, how can I decode them into a symbol?

You can call bytes on a list of integers in the range 0..255.

So your example reverses like this:

>>> bytes([195, 136]).decode('utf8')
'È'

If you want the codepoint, wrap it in ord() :

>>> ord(bytes([195, 136]).decode('utf8'))
200

Note: the last step only works if the byte sequence corresponds to a single Unicode character (codepoint).

  1. You have to remember that char only stores 8 bits: -128 to 127. So if 'number' is bigger than char limits it won't work.

     number = 127 print(f"number: {number}") li = list(chr(number).encode("utf-8")) print(f"List of byte: {li}") dec = int.from_bytes(li, byteorder='big') print(f"Type dec: {type(dec)}") print(f"Value dec: {dec}")

    在此处输入图像描述

     number = 128 print(f"number: {number}") li = list(chr(number).encode("utf-8")) print(f"List of byte: {li}") dec = int.from_bytes(li, byteorder='big') print(f"Type dec: {type(dec)}") print(f"Value dec: {dec}")

    在此处输入图像描述

    Take a look at python documentation for converting values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM