简体   繁体   中英

Python Regular Expression searching using previous result

Sorry I'm new to this, but I couldn't find an answer to a question that I most certainly do not even know how to ask.

Lets say I have an XML file that has something like this:

<fields>
   <field1>
       <name>Frank</name>
   </field1>
   <field2>
       <name>Bob</name>
   </field2>
   <field3>
        <name>Spam</name>
   </field3>
</fields>

And I would like to delete any where name = Bob. I can try

regex = re.compile("<fields>.*<field/d><name>Bob</field/d>.*</fields>"
data = regex.sub("", data"

My delimma is that everything between and is deleting. How can I specify that I want the /d to be the same for both, so that I can delete only what's between and ? In effect, I want the resulting XML to look like

<fields>
   <field1>
       <name>Frank</name>
   </field1>
   <field3>
        <name>Spam</name>
   </field3>
</fields>

thanks!

Use "backreference":

import re

text = """<fields>
   <field1>
       <name>Frank</name>
   </field1>
   <field2>
       <name>Bob</name>
   </field2>
   <field3>
        <name>Spam</name>
   </field3>
</fields>"""

pattern = re.compile(
    r'(<field(?P<n>\d)>[\s\S]+Bob[\s\S]+</field(?P=n)>)')

print(pattern.sub('', text))

# <fields>
#    <field1>
#        <name>Frank</name>
#    </field1>
#
#    <field3>
#         <name>Spam</name>
#    </field3>
# </fields>

https://docs.python.org/2/library/re.html :

(?P=name) A backreference to a named group; it matches whatever text was matched by the earlier group named name.

Also as @JimDennis mentioned, it is really a bad idea to use regular expressions to parse / process XML data. Use XML parsers instead!

Please don't use regular expressions to parse XML, HTML, or other SGML based text. At the very lowest level most parsers use regular expressions; but the process of parsing these is rife with pitfalls and your code will be much more robust if you use libraries which have already been written (and debugged) for doing this.

I'd recommend reading: How do I parse XML in Python? right here on StackOverflow for more details on that.

In answer to your specific question, you could do this with a Regular Expression Backreference ... which can be used to "capture" portions of matched text and to refer back to them (usually by code using the match results, but also even just within later parts of the regular expression).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM