简体   繁体   English

使用Python替换正则表达式中的非字母数字字符

[英]Replacing non-alphanumeric characters in regex match using Python

I have a text file (verilog) that contains certain string sequences (escaped identifiers) that I want to modify. 我有一个文本文件(verilog),其中包含要修改的某些字符串序列(转义的标识符)。 In the example below, I want to find any group starting with '\\' and ending with ' ' (any printable character can be in between). 在下面的示例中,我想找到任何以'\\'开头并以''结尾的组(任何可打印字符都可以位于两者之间)。 After finding a group that matches this criteria, I want to replace all non-alphanumeric characters with alphanumeric ones (I don't really care what alphanumeric they get replaced with). 找到一个符合此条件的组后,我想用字母数字字符替换所有非字母数字字符(我并不在乎它们会被替换为什么字母数字)。

In[1]:  here i$ \$0me text to \m*dify
Out[1]: here i$ aame text to madify

I have no problem finding the groups that need replacing using regex. 我没有问题找到需要使用正则表达式替换的组。 However, if I just use re.findAll(), I no longer have the location of the words in the string to reconstruct the string after modifying the matched groups. 但是,如果仅使用re.findAll(),则在修改匹配的组后,我将不再具有字符串中单词的位置来重建字符串。

Is there a way to preserve the location of the words in the string while modifying each match separately? 有没有一种方法可以在分别修改每个匹配项时保留字符串中单词的位置?

Note: I previously asked a very similar question here , but I oversimplified my example. 注意:我以前在这里问过一个非常类似的问题,但是我简化了我的示例。 I thought editing my existing question would make the existing comments and answers confusing to future readers. 我认为编辑我现有的问题会使现有的评论和答案使将来的读者感到困惑。

My answer to your previous question still applies, with some minor modifications. 我对上一个问题的回答仍然适用,但做了一些小的修改。 Only the regex changes. 仅正则表达式更改。

Since this is more complex, define a function to pass as a callback. 由于这比较复杂,因此定义一个函数作为回调传递。

In [57]: def foo(m):
    ...:     return ''.join(x if re.match('[a-zA-Z]', x)\
                              else ('' if x == '\\' else 'a') for x in m.group()) 

Now, call re.sub : 现在,致电re.sub

In [58]: re.sub(r'\\.*?(?= |$)', foo, text)
Out[58]: 'here i$ aame text to madify'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 正则表达式 - 用破折号替换非字母数字字符和空格 - Python Regex - Replacing Non-Alphanumeric Characters AND Spaces with Dash 在 Python 中使用 RegEx 替换除一种特定模式之外的所有非字母数字字符 - Replace all Non-Alphanumeric Characters except one particular pattern using RegEx in Python 如何使用正则表达式删除 python 中某个字符串的前导和尾随非字母数字字符? - How to remove leading and trailing non-alphanumeric characters of a certain string in python using regex? 正则表达式匹配非字母数字字符 - Regex matching non-alphanumeric characters 通过正则表达式替换删除非字母数字字符 - Remove non-alphanumeric characters by regex substitution 使用bash或python删除非字母数字字符 - Removing non-alphanumeric characters with bash or python 使用Regex匹配python中的字母数字字符列表 - Using Regex to match a list of alphanumeric characters in python 查找并打印python中非字母数字字符的数量 - Find and print the number of non-alphanumeric characters in python Python:如何拆分字符串但保留非字母数字字符 - Python: How to split string but preserve the non-alphanumeric characters 使用正则表达式在 python 中正确替换“开始”非字母数字字符 - Proper replacement of "beginning" non-alphanumeric characters, in python, using regular expressions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM