简体   繁体   English

解码python二进制字符串,但不能确保ascii符号

[英]decode python binary string but not ensure ascii symbols

I have a binary object: 我有一个二进制对象:

b'{"node": "\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435"}}'

and I want it to be printed in Unicode and not strictly using ASCII symbols. 我希望它以Unicode打印,而不是严格使用ASCII符号。

There is a hacky way to do it: 有一种很简单的方法:

decoded = string.decode()
parsed_to_dict = json.loads(decoded)
dumped = json.dumps(parsed_to_dict, ensure_ascii=False)
print(dumped)

>>> {"node": "Обновление"}

however the text will not always be parseable as JSON, so I need a simpler way. 但是文本并不总是可以解析为JSON,因此我需要一种更简单的方法。

Is there a way to print out my binary object (or a decoded Unicode string) as a non-ascii string without going trough parsing/dumping JSON? 有没有一种方法可以将我的二进制对象(或解码后的Unicode字符串)作为非ascii字符串打印出来,而无需通过解析/转储JSON?

For example, how to print this b'\\\О\\\б\\\н\\\о\\\в\\\л\\\е\\\н\\\и\\\е' as Обновление ? 例如,如何打印此b'\\\О\\\б\\\н\\\о\\\в\\\л\\\е\\\н\\\и\\\е'Обновление

A bytes string like 像这样的bytes字符串

b'\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'

has been encoded using Unicode escape sequences. 已使用Unicode转义序列进行编码。 To convert it back into a proper Unicode string you simply need to specify the 'unicode-escape' codec: 要将其转换回正确的Unicode字符串,您只需指定“ unicode-escape”编解码器:

data = b'\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'
out = data.decode('unicode-escape')
print(out)

output 输出

Обновление

However, if data is already a Unicode string, then you first need to encode it to bytes. 但是,如果data已经是Unicode字符串,则首先需要将其编码为字节。 You can do that using the ascii codec, presuming data only contains ASCII characters. 您可以使用ascii编解码器执行此操作,假定data仅包含ASCII字符。 If it contains characters outside ASCII but within the range of \\x80 to \\xff you may be able to use the 'latin1' codec. 如果它包含ASCII以外但在\\x80\\xff范围内的\\xff ,则可以使用'latin1'编解码器。

data = '\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435'
out = data.encode('ascii').decode('unicode-escape')

This should work so long as all the escapes are valid (no single \\ ). 只要所有转义符都有效(没有单个\\ ),就应该可以使用。

import ast
bytes_object = b'{"node": "\\u041e\\u0431\\u043d\\u043e\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435"}}'

unicode_string = ast.literal_eval("'{}'".format(bytes_object.decode()))

output: 输出:

'{"node": "Обновление"}}'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM