[英]How to remove those "\x00\x00"
How to remove those "\x00\x00" in a string ?如何删除字符串中的“\x00\x00”? I have many of those strings (example shown below).
我有很多这样的字符串(示例如下所示)。 I can use
re.sub
to replace those "\x00".我可以使用
re.sub
来替换那些“\x00”。 But I am wondering whether there is a better way to do that?但我想知道是否有更好的方法来做到这一点? Converting between unicode, bytes and string is always confusing.
Unicode、字节和字符串之间的转换总是令人困惑。
'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.
Use rstrip
使用
rstrip
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \\x00
characters at the end of the string.它删除字符串末尾的所有
\\x00
字符。
>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> a.replace('\x00','')
'Hello'
I think the more general solution is to use:我认为更通用的解决方案是使用:
cleanstring = nullterminatedstring.split('\x00',1)[0]
Which will split
the string using \\x00
as the delimeter 1
time.这将使用
\\x00
作为分隔符1
次split
字符串。 The split(...)
returns a 2 element list: everything before the null in addition to everything after the null (it removes the delimeter). split(...)
返回一个 2 元素列表:空值之前的所有内容以及空值之后的所有内容(它删除了分隔符)。 Appending [0]
only returns the portion of the string before the first null (\\x00) character, which I believe is what you're looking for.附加
[0]
仅返回第一个空 (\\x00) 字符之前的字符串部分,我相信这就是您要查找的内容。
The convention in some languages, specifically C-like, is that a single null character marks the end of the string.某些语言(特别是 C 类语言)的约定是单个空字符标记字符串的结尾。 For example, you should also expect to see strings that look like:
例如,您还应该期望看到如下所示的字符串:
'Hello\x00dpiecesofsomeoldstring\x00\x00\x00'
The answer supplied here will handle that situation as well as the other examples.此处提供的答案将处理这种情况以及其他示例。
Building on the answers supplied, I suggest that strip() is more generic than rstrip() for cleaning up a data packet, as strip() removes chars from the beginning and the end of the supplied string, whereas rstrip() simply removes chars from the end of the string.基于提供的答案,我建议 strip() 在清理数据包方面比 rstrip() 更通用,因为 strip() 从提供的字符串的开头和结尾删除字符,而 rstrip() 只是删除字符从字符串的末尾。
However, NUL chars are not treated as whitespace by default by strip(), and as such you need to specify explicitly.但是,默认情况下,strip() 不会将 NUL 字符视为空格,因此您需要明确指定。 This can catch you out, as print() will of course not show the NUL chars.
这可能会让您措手不及,因为 print() 当然不会显示 NUL 字符。 My solution that I used was to clean the string using "
.strip().strip('\\x00')
":我使用的解决方案是使用“
.strip().strip('\\x00')
”清理字符串:
>>> arbBytesFromSocket = b'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii')
>>> print(arbBytesAsString)
hello
>>> str(arbBytesAsString)
'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii').strip().strip('\x00')
>>> str(arbBytesAsString)
'hello'
>>>
This gives you the string/byte array required, without the NUL chars on each end, and also preserves any NUL chars inside the "data packet", which is useful for received byte data that may contain valid NUL chars (eg. a C-type structure).这为您提供了所需的字符串/字节数组,每一端都没有 NUL 字符,并且还保留了“数据包”中的任何 NUL 字符,这对于接收到的可能包含有效 NUL 字符的字节数据(例如 C-类型结构)。 NB.
注意。 In this case the packet must be "wrapped", ie surrounded by non-NUL chars (prefix and suffix), to allow correct detection, and thus only strip unwanted NUL chars.
在这种情况下,数据包必须“包装”,即由非 NUL 字符(前缀和后缀)包围,以允许正确检测,从而仅去除不需要的 NUL 字符。
I tried strip
and rstrip
and they didn't work, but this one did;我尝试了
strip
和rstrip
,但它们不起作用,但是这个起作用了; Use split
and then join
the result list
:使用
split
然后join
结果list
:
if '\x00' in name:
name=' '.join(name.split('\x00'))
I ran into this problem copy lists out of Excel.我在 Excel 中遇到了这个问题副本列表。 Process was:
过程是:
Problem was intermitently was returning multiple '\\x00' at the end of the text when reading the clipboard.问题是在读取剪贴板时间歇性地在文本末尾返回多个 '\\x00'。
Have changed from using win32clipboard to using pyperclip to read the clipboard, and it seems to have resolved the problem.已经从使用win32clipboard改为使用pyperclip读取剪贴板,似乎解决了问题。
Neil above wrote, '...you might want to put some thought into why you have them in the first place.'上面的尼尔写道,'......你可能想先考虑一下为什么你首先拥有它们。 For my own issue with this error code, this led me to the problem.
对于我自己对此错误代码的问题,这导致了我的问题。 My saved file that I was reading from was in unicode.
我正在读取的保存文件是 unicode 格式的。 Once I re-saved the file as a plain ASCII text, the problem was solved
一旦我将文件重新保存为纯 ASCII 文本,问题就解决了
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.