How to select the following specific XML nodes using XPath?

Question

I have an XML doc like the following:

<Objects>
  <object distName="a/b">  </object>
  <object distName="a/b/c1">  </object>
  <object distName="a/b/c4/d/e">  </object>
  <object distName="a/b/c2">  </object>
  <object distName="a/b/c6/d">  </object>
</Objects>

And I need to select all nodes which has a path that ends with "c" + number . Like: " a/b/c1 " and " a/b/c2 " but not like " a/b/c6/d ", nor " a/b/c4/d/e ".

If I try the following:

      `cNodes = xmlDoc.xpath("//object[contains(@path, `a/b/c`)]")`

Then this will include "a/b/c6/d" and "a/b/c4/d/e" which is not what I require.

So is there a way to do the job in one or maybe two lines of code . I mean I can do it with like a loop and stuff like that, which I don't want to. That's because the real XML doc is thousands of nodes.

PS: Python 2.7, lxml

Answer 1

I'm afraid this can't be done using pure XPath 1.0 which is XPath version that lxml supports.

As an alternative, you can try to split the attribute by / , get the last split result, and check if it starts with c , all in one line using list comprehension, for example :

>>> raw = '''<Objects>
...   <object distName="a/b">  </object>
...   <object distName="a/b/c1">  </object>
...   <object distName="a/b/c4/d/e">  </object>
...   <object distName="a/b/c2">  </object>
...   <object distName="a/b/c6/d">  </object>
... </Objects>'''
... 
>>> from lxml import etree
>>> xmlDoc = etree.fromstring(raw)
>>> cNodes = xmlDoc.xpath("//object[contains(@path, 'a/b/c')]")
>>> result = [etree.tostring(n) for n in cNodes if n.attrib["distName"].split('/')[-1].startswith("c")]
>>> print result
['<object distName="a/b/c1">  </object>\n  ', '<object distName="a/b/c2">  </object>\n  ']

Answer 2

Unfortuantely it's not very simple to express a condition that matches patterns using XPath 1.0. But if you can make certain assumptions about what you're looking for, you can craft such a query.

//object[starts-with(@distName, 'a/b/c') and substring-after(@distName, 'a/b/c') >= 0]

Breaking it up, we're first checking if the distName attribute starts with a/b/c . Then if everything after that string is some number. Depending on your needs, this might just be enough.

How to select the following specific XML nodes using XPath?

Question

2 answers

solution1
1 2016-05-25 04:58:03

solution2
1 ACCPTED 2016-05-25 05:04:51

How to select the following specific XML nodes using XPath?

Question

2 answers

solution1 1 2016-05-25 04:58:03

solution2 1 ACCPTED 2016-05-25 05:04:51

solution1
1 2016-05-25 04:58:03

solution2
1 ACCPTED 2016-05-25 05:04:51