[英]Replacing non-alphanumeric characters in regex match using Python
I have a text file (verilog) that contains certain string sequences (escaped identifiers) that I want to modify. 我有一个文本文件(verilog),其中包含要修改的某些字符串序列(转义的标识符)。 In the example below, I want to find any group starting with '\\' and ending with ' ' (any printable character can be in between).
在下面的示例中,我想找到任何以'\\'开头并以''结尾的组(任何可打印字符都可以位于两者之间)。 After finding a group that matches this criteria, I want to replace all non-alphanumeric characters with alphanumeric ones (I don't really care what alphanumeric they get replaced with).
找到一个符合此条件的组后,我想用字母数字字符替换所有非字母数字字符(我并不在乎它们会被替换为什么字母数字)。
In[1]: here i$ \$0me text to \m*dify
Out[1]: here i$ aame text to madify
I have no problem finding the groups that need replacing using regex. 我没有问题找到需要使用正则表达式替换的组。 However, if I just use re.findAll(), I no longer have the location of the words in the string to reconstruct the string after modifying the matched groups.
但是,如果仅使用re.findAll(),则在修改匹配的组后,我将不再具有字符串中单词的位置来重建字符串。
Is there a way to preserve the location of the words in the string while modifying each match separately? 有没有一种方法可以在分别修改每个匹配项时保留字符串中单词的位置?
Note: I previously asked a very similar question here , but I oversimplified my example. 注意:我以前在这里问过一个非常类似的问题,但是我简化了我的示例。 I thought editing my existing question would make the existing comments and answers confusing to future readers.
我认为编辑我现有的问题会使现有的评论和答案使将来的读者感到困惑。
My answer to your previous question still applies, with some minor modifications. 我对上一个问题的回答仍然适用,但做了一些小的修改。 Only the regex changes.
仅正则表达式更改。
Since this is more complex, define a function to pass as a callback. 由于这比较复杂,因此定义一个函数作为回调传递。
In [57]: def foo(m):
...: return ''.join(x if re.match('[a-zA-Z]', x)\
else ('' if x == '\\' else 'a') for x in m.group())
Now, call re.sub
: 现在,致电
re.sub
:
In [58]: re.sub(r'\\.*?(?= |$)', foo, text)
Out[58]: 'here i$ aame text to madify'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.