简体   繁体   中英

Fast traverse through lxml tree by avoiding specific branch

Suppose I have an etree as following:

my_data.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>
  <country name="Liechtenstein" xmlns="aaa:bbb:ccc:liechtenstein:eee">
    <rank updated="yes">2</rank>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
  </country>
  <country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
    <continent>Asia</continent>
    <holidays>
      <christmas>Yes</christmas>
    </holidays>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
  <ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
    <malay>
      <holidays>
        <ramadan>Yes</ramadan>
      </holidays>
    </malay>
  </ethnicity>
</data>

Parsing:

xtree = etree.parse('my_data.xml')
xroot = xtree.getroot()

I want to traverse through the tree and do stuff to all branches, except certain brances. In this example, I want to exclude the ethnicity branch:

node_to_exclude = xroot.xpath('.//*[local-name()="ethnicity"]')
exclude_path = xtree.getelementpath(node_to_exclude[0])

for element in xroot.iter('*'):
   if exclude_path not in xtree.getelementpath(element ):
      ...do stuff...

But this will still traverse through the entire tree. Is there any better / faster way than this (ie ignore the entire ethnicity branch together)? I m looking for a syntactical solution, not a recursive algorithm.

XPath can do this for you

for element in xroot.xpath('.//*[not(ancestor-or-self::*[local-name()="ethnicity"])]'):
    # ...do stuff...

It might - or might not, measure it - improve performance to specify which ancestor you mean. For example, if <ethnicity xmlns="..."> always is a child of the top-level element, ie "the penultimate ancestor", you could do this:

for element in xroot.xpath('.//*[not(ancestor-or-self::*[last()-1][local-name()="ethnicity"])]'):
    # ...do stuff...

Of course you can also do something like:

for child in xroot.getchildren()
    if 'ethnicity' in child.tag:
        continue
    for element in child.xpath('//*'):
        # ...do stuff...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM