简体   繁体   中英

Replacing non-alphanumeric characters in regex match using Python

I have a text file (verilog) that contains certain string sequences (escaped identifiers) that I want to modify. In the example below, I want to find any group starting with '\\' and ending with ' ' (any printable character can be in between). After finding a group that matches this criteria, I want to replace all non-alphanumeric characters with alphanumeric ones (I don't really care what alphanumeric they get replaced with).

In[1]:  here i$ \$0me text to \m*dify
Out[1]: here i$ aame text to madify

I have no problem finding the groups that need replacing using regex. However, if I just use re.findAll(), I no longer have the location of the words in the string to reconstruct the string after modifying the matched groups.

Is there a way to preserve the location of the words in the string while modifying each match separately?

Note: I previously asked a very similar question here , but I oversimplified my example. I thought editing my existing question would make the existing comments and answers confusing to future readers.

My answer to your previous question still applies, with some minor modifications. Only the regex changes.

Since this is more complex, define a function to pass as a callback.

In [57]: def foo(m):
    ...:     return ''.join(x if re.match('[a-zA-Z]', x)\
                              else ('' if x == '\\' else 'a') for x in m.group()) 

Now, call re.sub :

In [58]: re.sub(r'\\.*?(?= |$)', foo, text)
Out[58]: 'here i$ aame text to madify'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM