简体   繁体   中英

How to select the following specific XML nodes using XPath?

I have an XML doc like the following:

<Objects>
  <object distName="a/b">  </object>
  <object distName="a/b/c1">  </object>
  <object distName="a/b/c4/d/e">  </object>
  <object distName="a/b/c2">  </object>
  <object distName="a/b/c6/d">  </object>
</Objects>

And I need to select all nodes which has a path that ends with "c" + number . Like: " a/b/c1 " and " a/b/c2 " but not like " a/b/c6/d ", nor " a/b/c4/d/e ".

If I try the following:

      `cNodes = xmlDoc.xpath("//object[contains(@path, `a/b/c`)]")`

Then this will include "a/b/c6/d" and "a/b/c4/d/e" which is not what I require.

So is there a way to do the job in one or maybe two lines of code . I mean I can do it with like a loop and stuff like that, which I don't want to. That's because the real XML doc is thousands of nodes.

PS: Python 2.7, lxml

I'm afraid this can't be done using pure XPath 1.0 which is XPath version that lxml supports.

As an alternative, you can try to split the attribute by / , get the last split result, and check if it starts with c , all in one line using list comprehension, for example :

>>> raw = '''<Objects>
...   <object distName="a/b">  </object>
...   <object distName="a/b/c1">  </object>
...   <object distName="a/b/c4/d/e">  </object>
...   <object distName="a/b/c2">  </object>
...   <object distName="a/b/c6/d">  </object>
... </Objects>'''
... 
>>> from lxml import etree
>>> xmlDoc = etree.fromstring(raw)
>>> cNodes = xmlDoc.xpath("//object[contains(@path, 'a/b/c')]")
>>> result = [etree.tostring(n) for n in cNodes if n.attrib["distName"].split('/')[-1].startswith("c")]
>>> print result
['<object distName="a/b/c1">  </object>\n  ', '<object distName="a/b/c2">  </object>\n  ']

Unfortuantely it's not very simple to express a condition that matches patterns using XPath 1.0. But if you can make certain assumptions about what you're looking for, you can craft such a query.

//object[starts-with(@distName, 'a/b/c') and substring-after(@distName, 'a/b/c') >= 0]

Breaking it up, we're first checking if the distName attribute starts with a/b/c . Then if everything after that string is some number. Depending on your needs, this might just be enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM