简体   繁体   English

如何将python Unicode字符串转换为字节

[英]How to convert python Unicode string to bytes

I have a string x as below 我有一个字符串x如下

x = "\xe9\x94\x99\xe8\xaf\xaf"

This string should be Unicode string, but cannot be displayed (print) correctly. 此字符串应为Unicode字符串,但无法正确显示(打印)。

And the string y is Unicode string/ bytes started with b , And y can be displayed correctly by y.decode('utf-8') 字符串y是以b开头的Unicode字符串/字节, y可以通过y.decode('utf-8')正确显示

y = b"\xe9\x94\x99\xe8\xaf\xaf"

My question is how to convert x to y ? 我的问题是如何将x转换为y?

Assuming we're talking about Python3, the Unicode string x is 6 code points long. 假设我们在讨论Python3,那么Unicode字符串x长度是6个代码点。 It happens to be that each of those code points is in range 0x00 to 0xff (ASCII subset). 恰好是每个代码点都在0x000xff (ASCII子集)的范围内。 We can get the exact byte string with the raw_unicode_escape codec, like this: 我们可以使用raw_unicode_escape编解码器获取确切的字节字符串,如下所示:

>>> x = "\xe9\x94\x99\xe8\xaf\xaf"
>>> y = x.encode('raw_unicode_escape')
>>> y
b'\xe9\x94\x99\xe8\xaf\xaf'
>>> y.decode('utf8')
'错误'

Note that this will only work if the string x contains only ASCII subrange of Unicode; 请注意,这仅在字符串x仅包含Unicode的ASCII子范围时才有效; otherwise you'll just get escaped Unicode code points (as the codec's name suggests): 否则你只会获得转义的Unicode代码点(如编解码器的名称所示):

>>> "šž".encode('raw_unicode_escape')
b'\\u0161\\u017e'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM