简体   繁体   中英

Remove xml tags using Python 3.5

So I'm new using python. I'm trying to remove an xml tag from an xml document. Trying to remove ALL of <tag2> and </tag2> tags, but keep the "foo" and "bar". Suggestions? Trying to avoid lxml

  <entry name="xml">
    <tag>
      <tag2>foo</tag2>
    </tag>
    <tag3>
      <tag2>bar</tag2>
    </tag3>
    <tag4>
      <tag2>foo</tag2>
    </tag4>
    <tag5>
      <tag2>bar</tag2>
    </tag5>
  </entry> 

EDIT: Here's what I need the output to be

entry name="xml">
    <tag>
      foo
    </tag>
    <tag3>
      bar
    </tag3>
    <tag4>
      foo
    </tag4>
    <tag5>
      bar
    </tag5>
  </entry>

You could iterate over the element tree with xml. This creates a list of all the tags with text in them.

import xml.etree.ElementTree as ET

tree = ET.parse('x.xml')
root = tree.getroot()

text = []
for child in tree.iter():
    if '\n' not in child.text:
        text.append(child.text) 

Or a simpler statement from David Zemens

text = [child.text for child in tree.iter() if not child.text.strip() == '']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM