简体   繁体   English

如何将字节列表(unicode)转换为 Python 字符串?

[英]How to convert list of bytes (unicode) to Python string?

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it.我有一个字节列表(8 位字节,或者在 C/C++ 语言中它们形成 wchar_t 类型字符串),它们形成一个 UNICODE 字符串(逐字节),如何将这些值转换为 Python 字符串,尝试了一些事情,但是没有人可以将这 2 个字节连接成 1 个字符并从中构建一个完整的字符串。 Thank you.谢谢你。

Converting a sequence of bytes to a Unicode string is done by calling the decode() method on that str (in Python 2.x) or bytes (Python 3.x) object. 将字节序列转换为Unicode字符串是通过在该str (在Python 2.x中)或bytes (Python 3.x)对象上调用decode()方法来完成的。

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist) or b''.join(bytelist) . 如果你实际上有一个字节列表,那么,为了获得这个对象,你可以使用''.join(bytelist)b''.join(bytelist)

You need to specify the encoding that was used to encode the original Unicode string. 您需要指定用于编码原始Unicode字符串的编码。

However, the term "Python string" is a bit ambiguous and also version-dependent. 但是,术语“Python字符串”有点模糊,也与版本有关。 The Python str type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. Python str类型代表Python 2.x中的字节字符串和Python 3.x中的Unicode字符串。 So, in Python 2, just doing ''.join(bytelist) will give you a str object. 所以,在Python 2中,只需要执行''.join(bytelist)就会给你一个str对象。

Demo for Python 2: Python 2的演示:

In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест

In [5]: ''.join(bytelist) == 'тест'
Out[5]: True

你也可以使用decode()将字节列表转换为字符串列表

stringlist=[x.decode('utf-8') for x in bytelist]

Here's what worked the best for me:以下是对我最有效的方法:

import codecs

print(type(data)) # <class 'bytes'>
data: str = codecs.decode(data, 'UTF-8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM