简体   繁体   English

python 从字符串中删除 ctrl 字符

[英]python remove ctrl-character from string

I have a bunch of XML files dumped to disk in batches.我有一堆 XML 文件分批转储到磁盘。 When I tried to prase them I found that some hade a control character inserted into an attribute.当我试图赞美它们时,我发现一些控制字符插入到属性中。

It looked like this:它看起来像这样:

<root ^KIND="A"></root>

When it was supposed to look like this:当它应该看起来像这样时:

<root KIND="A"></root>

Now in this case it was easily fixed, just some regexp magic:现在在这种情况下,它很容易修复,只是一些正则表达式魔法:

import re
xml = re.sub(r'<([^>]*)\v([^>]*)>', r'<\1K\2>', xml)

But then the requirements changed, I had to dump the docs out to disk, individually.但是后来需求发生了变化,我不得不将文档单独转储到磁盘上。 Naturally I raw the substitution before saving so i wouldn't have that problem again.自然地,我在保存之前进行了替换,这样我就不会再遇到这个问题了。

There are alot of these documents you see, many millions...你会看到很多这样的文件,数以百万计……

And so, I was getting ready to extract some data from them again.因此,我准备再次从中提取一些数据。

This time however I got a new error:然而这一次我得到了一个新的错误:

<root KIND="A"><CLASSIFICATION></CLASSIFICATIO^N></root>

When it was supposed to look like this:当它应该看起来像这样时:

<root KIND="A"><CLASSIFICATION></CLASSIFICATION></root>

I am not sure why I keep getting these errors not why its always 'ctrl-characters` that are inserted.我不确定为什么我不断收到这些错误,而不是为什么总是插入“ctrl-characters”。 It migth be that its pure luck so far.到目前为止,这可能是纯粹的运气。

The regexp I used in hte first case wont wore in general, ^K translates to vertical tab so I could match agains that.我在第一种情况下使用的正则表达式通常不会穿,^K 转换为垂直制表符,所以我可以匹配。 But is there some what to filter out any ctrl-character?但是有什么可以过滤掉任何 ctrl 字符吗?

Try using a translate table to get rid of ctrl-A through ctrl-Z:尝试使用翻译表通过 ctrl-Z 摆脱 ctrl-A:

in_chars = ''.join([chr(x) for x in range(1, 27)])
out_chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
tr_table = str.maketrans(in_chars, out_chars)

# pass all strings through the translate table:
x = input('Enter text: ')
print(x.translate(tr_table))

Prints:印刷:

Enter text: abc^Kdef
abcKdef

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM