简体   繁体   中英

Can I use re.sub (or regexobject.sub) to replace text in a subgroup?

I need to parse a configuration file which looks like this (simplified):

<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>

My goal is to be able to change parameters specific to a particular link, but I'm having trouble getting substitution to work correctly. I have a regex that can isolate a parameter value on a specific link, where the value is contained in capture group 1:

link_id = r'id="1"'
parameter = 'mode'
link_regex = '<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>([\w\W]*)</%s>[\w\W]*</link>' \
% (link_id, parameter, parameter)

Thus,

print re.search(final_regex, f_read).group(1)

prints ipsec

The examples in the regex howto all seem to assume that one wants to use the capture group in the replacement, but what I need to do is replace the capture group itself (eg change the Link1 mode from ipsec to udp).

I have to give you the obligatory: "don't use regular expressions to do this."

Check out how very easily awesome it is to do this with BeautifulSoup , for example:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> html = """
... <config>
... <links>
... <link name="Link1" id="1">
...  <encapsulation>
...   <mode>ipsec</mode>
...  </encapsulation>
... </link>
... <link name="Link2" id="2">
...  <encapsulation>
...   <mode>udp</mode>
...  </encapsulation>
... </link>
... </links>
... </config>
... """
>>> soup = BeautifulStoneSoup(html)
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>ipsec</mode>
</encapsulation>
</link>
>>> soup.find('link', id=1).mode.contents[0].replaceWith('whatever')
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>whatever</mode>
</encapsulation>
</link>

Looking at your regular expression I can't really tell if this is exactly what you wanted to do, but whatever it is you want to do, using a library like BeautifulSoup is much, much, better than trying to patch a regular expression together. I highly recommend going this route if possible.

This looks like valid XML, in that case you don't need BeautifulSoup, definitely not the regex, just load XML using any good XML library, edit it and print it out, here is a approach using ElementTree:

import xml.etree.cElementTree as ET

s = """<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
"""
configElement = ET.fromstring(s)

for modeElement in configElement.findall("*/*/*/mode"):
    modeElement.text = "udp"

print ET.tostring(configElement)

It will change all mode elements to udp , this is the output:

<config>
<links>
<link id="1" name="Link1">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
<link id="2" name="Link2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>

Supposing that your link_regex is correct, you can add parenthesis like this:

(<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>)([\w\W]*)(</%s>[\w\W]*</link>)

and then you could do:

p = re.compile(link_regex)
replacement = 'foo'
print p.sub(r'\g<1>' + replacement + r'\g<3>' , f_read)

not sure i'd do it that way, but the quickest way would be to shift the captures:

([\\w\\W] [\\w\\W] <%s>)[\\w\\W] ([\\w\\W] )' and replace with group1 +mode+group2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM