简体   繁体   中英

How to ignore xml children that have a certain attribute?

Let's say I have this xml (It's long but I want everything as clear as possible)

<?xml version="1.0" encoding="utf-8"?>
  <Nodes>
    <Node ComponentID="1">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="2">
      <Settings>
        <Value name="Text Box (1)"> SettingA </Value>
        <Value name="Text Box (2)"> SettingB </Value>
        <Value name="Text Box (3)"> SettingC </Value>
        <Value name="Text Box (4)"> SettingD </Value>
      <AdvSettings State="Off"/>
      </Settings>
    </Node>
    <Node ComponentID="3">
      <Settings>
        <Value name="Text Box (1)"> SettingG </Value>
        <Value name="Text Box (2)"> SettingH </Value>
        <Value name="Text Box (3)"> SettingI </Value>
        <Value name="Text Box (4)"> SettingJ </Value>
      <AdvSettings State="On"/>
      </Settings>
    </Node>
    <Node ComponentID="4">
      <Configuration>
        <Disabled Value="True"/>
      </Configuration>
      <Childnodes>
        <Node ComponentID="5">
          <Settings>
            <Value name="Text Box (1)"> SettingK </Value>
            <Value name="Text Box (2)"> SettingL </Value>
            <Value name="Text Box (3)"> SettingM </Value>
            <Value name="Text Box (4)"> SettingN </Value>
          <AdvSettings State="On"/>
          </Settings>
        </Node>
        <Node ComponentID="6">
          <Settings>
            <Value name="Text Box (1)"> SettingO </Value>
            <Value name="Text Box (2)"> SettingP </Value>
            <Value name="Text Box (3)"> SettingQ </Value>
            <Value name="Text Box (4)"> SettingR </Value>
          <AdvSettings State="On"/>
          </Settings>
        </Node>
      </Childnodes>
    </Node>
    <Node ComponentID="7">
      <Configuration>
        <Disabled Value="False"/>
      </Configuration>
      <Childnodes>
        <Node ComponentID="8">
          <Settings>
            <Value name="Text Box (1)"> SettingK </Value>
            <Value name="Text Box (2)"> SettingL </Value>
            <Value name="Text Box (3)"> SettingM </Value>
            <Value name="Text Box (4)"> SettingN </Value>
          <AdvSettings State="On"/>
          </Settings>
        </Node>
      </Childnodes>
    </Node>
  </Nodes>

So my goal with Python was to get the Values of text box 1 and text box 2 for each Node that has "AdvSettings" set on ON. I ended up using this script (with help from a previous question):

import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch2.xml')
root = tree.getroot()

nodes = root.findall('.//Node')
for node in nodes:
    adv = node.find('.//AdvSettings')
    if adv is None:
        continue
    flag = adv.attrib.get('State','Off')
    if flag == 'On':
        print(node.attrib.get('ComponentID'),node.find('.//Value[@name="Text Box (1)"]').text.strip(),node.find('.//Value[@name="Text Box (2)"]').text.strip())  

However, as a new request, we do not want the values of components that are without in disabled component. The script I had does not take disabled into account. So with the xml of above, we do not want information about components 5 or 6. Keep in mind one such "container" can contain multiple or no components at all

I tried that with the script below, but that gives no results:

import xml.etree.ElementTree as ET
tree = ET.parse('XMLSearch2.xml')
root = tree.getroot()

nodes = root.findall('.//Node')
for node in nodes:
    Container = node.find('.//Disabled')
    if Container is None:
        continue
    adv = node.find('.//AdvSettings')
    if adv is None:
        continue
    flag = adv.attrib.get('State','Off')
    state = Container.attrib.get('Value',None)
    if state == 'false':
        if flag == 'On':
            print(node.attrib.get('ComponentID'),node.find('.//Value[@name="Text Box (1)"]').text.strip(),node.find('.//Value[@name="Text Box (2)"]').text.strip())
        else:
            continue
    else: 
        continue    

Any tips, suggestions?

For jobs like these you should use xpath, with either lxml, which supports xpath 1.0 or, if you are more adventurous, add elementpath , which supports xpath>1.0.

So with lxml:

from lxml import etree

vals = """[your xml above]"""
doc = etree.XML(vals.encode())

targets = doc.xpath('//Node[.//AdvSettings/@State="On"]//Settings//Value')
names = ["Text Box (1)","Text Box (2)"]
for targ in targets:
    if targ.attrib['name'] in names:
        print(targ.text)

If you can use elementpath it gets simpler:

from elementpath import select

expression = '//Node[.//AdvSettings/@State="On"]//Settings//Value//(.[@name="Text Box (1)"],.[@name="Text Box (2)"])/text()'
select(doc,expression)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM