简体   繁体   中英

How to convert list of bytes (unicode) to Python string?

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.

Converting a sequence of bytes to a Unicode string is done by calling the decode() method on that str (in Python 2.x) or bytes (Python 3.x) object.

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist) or b''.join(bytelist) .

You need to specify the encoding that was used to encode the original Unicode string.

However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist) will give you a str object.

Demo for Python 2:

In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест

In [5]: ''.join(bytelist) == 'тест'
Out[5]: True

你也可以使用decode()将字节列表转换为字符串列表

stringlist=[x.decode('utf-8') for x in bytelist]

Here's what worked the best for me:

import codecs

print(type(data)) # <class 'bytes'>
data: str = codecs.decode(data, 'UTF-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM