如何修复此字节对象的编码以仅保留实际文本并删除 Python3 中的 '\\x00\\x05*\\x00\\x00\\x0e\\x00bjbj'？

Question

The Problem:问题：

I am using an API that retrieves the content of interest in the form of a bytes object.我正在使用一个 API，它以字节对象的形式检索感兴趣的内容。

The bytes object (myobj) has a value of:字节对象 (myobj) 的值为：

myobj = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00This is \rthe sentence \rI want to \rkeep.\r\r\x03\r\r\x04\r\r\x03\r\r\x04\x017\x00\x06'

The Question:问题：

How do I only keep this: "This is the sentence I want to keep."我怎么只保留这个：“这是我想保留的句子。”

What I've Tried:我试过的：

1: I tried decoding with UTF-8, however the output was the same as the input. 1：我尝试用UTF-8解码，但输出与输入相同。 I also tried 'ascii', 'utf-16', and 'utf-8'.我还尝试过“ascii”、“utf-16”和“utf-8”。 If I remove the 'ignore' argument, i receive an error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte如果我删除 'ignore' 参数，我会收到一个错误：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

myobj.decode('utf-8', 'ignore')

2: Tried using the printable function from string which returned almost the same output as the input. 2：尝试使用 string 中的可打印函数，该函数返回的输出与输入几乎相同。

import string
mystr =str(myobj)
print( ''.join(x for x in test2 if x in mystr.printable))

3: I also tried using strip() and replace to remove portions of the string, however, there are too many distinct characters. 3：我也尝试使用strip()和replace来删除部分字符串，但是，有太多不同的字符。

Any suggestions would be great.任何建议都会很棒。

Thanks!谢谢！

Answer 1

You've almost got it.你已经差不多了。 Combine options 1 and 2:结合选项 1 和 2：

new_obj = ''.join(c for c in my_obj.decode('utf-8', 'ignore') if c.isprintable())

However, your new_obj will be:但是，您的new_obj将是：

'This is the sentence I want to keep.7'

That's because, near the end of my_obj , you've got '\\x017' .那是因为，在my_obj接近尾声时，您得到了'\\x017' 。 That's a byte with a value of 0x01 followed by the character '7' .这是一个值为 0x01 后跟字符'7'的字节。

如何修复此字节对象的编码以仅保留实际文本并删除 Python3 中的 '\\x00\\x05*\\x00\\x00\\x0e\\x00bjbj'？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-24 19:14:30

如何修复此字节对象的编码以仅保留实际文本并删除 Python3 中的 &#39;\\x00\\x05*\\x00\\x00\\x0e\\x00bjbj&#39;？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-24 19:14:30

如何修复此字节对象的编码以仅保留实际文本并删除 Python3 中的 '\\x00\\x05*\\x00\\x00\\x0e\\x00bjbj'？

解决方案1
1 已采纳 2020-08-24 19:14:30