简体   繁体   English

如何使用XPath选择以下特定的XML节点?

[英]How to select the following specific XML nodes using XPath?

I have an XML doc like the following: 我有一个如下的XML文档:

<Objects>
  <object distName="a/b">  </object>
  <object distName="a/b/c1">  </object>
  <object distName="a/b/c4/d/e">  </object>
  <object distName="a/b/c2">  </object>
  <object distName="a/b/c6/d">  </object>
</Objects>

And I need to select all nodes which has a path that ends with "c" + number . 我需要选择所有路径均以“ c” + number结尾的节点。 Like: " a/b/c1 " and " a/b/c2 " but not like " a/b/c6/d ", nor " a/b/c4/d/e ". 例如:“ a / b / c1 ”和“ a / b / c2 ”,但不喜欢“ a / b / c6 / d ”,也不喜欢“ a / b / c4 / d / e ”。

If I try the following: 如果我尝试以下操作:

      `cNodes = xmlDoc.xpath("//object[contains(@path, `a/b/c`)]")`

Then this will include "a/b/c6/d" and "a/b/c4/d/e" which is not what I require. 然后,这将包括我不需要的“ a / b / c6 / d”和“ a / b / c4 / d / e”。

So is there a way to do the job in one or maybe two lines of code . 因此,有没有一种方法可以用一行或两行代码来完成这项工作。 I mean I can do it with like a loop and stuff like that, which I don't want to. 我的意思是,我可以像一个循环之类的东西来做到这一点,而我不想这样做。 That's because the real XML doc is thousands of nodes. 那是因为真正的XML文档是数千个节点。

PS: Python 2.7, lxml PS:Python 2.7,lxml

I'm afraid this can't be done using pure XPath 1.0 which is XPath version that lxml supports. 恐怕使用纯XPath 1.0( lxml支持的XPath版本)无法完成此操作。

As an alternative, you can try to split the attribute by / , get the last split result, and check if it starts with c , all in one line using list comprehension, for example : 或者,您可以尝试使用/分割属性,获取最后的分割结果,然后使用列表推导在一行中检查它是否以c开头,例如:

>>> raw = '''<Objects>
...   <object distName="a/b">  </object>
...   <object distName="a/b/c1">  </object>
...   <object distName="a/b/c4/d/e">  </object>
...   <object distName="a/b/c2">  </object>
...   <object distName="a/b/c6/d">  </object>
... </Objects>'''
... 
>>> from lxml import etree
>>> xmlDoc = etree.fromstring(raw)
>>> cNodes = xmlDoc.xpath("//object[contains(@path, 'a/b/c')]")
>>> result = [etree.tostring(n) for n in cNodes if n.attrib["distName"].split('/')[-1].startswith("c")]
>>> print result
['<object distName="a/b/c1">  </object>\n  ', '<object distName="a/b/c2">  </object>\n  ']

Unfortuantely it's not very simple to express a condition that matches patterns using XPath 1.0. 不幸的是,使用XPath 1.0表示匹配模式的条件并不是很简单。 But if you can make certain assumptions about what you're looking for, you can craft such a query. 但是,如果您可以对要查找的内容做出某些假设,则可以进行这样的查询。

//object[starts-with(@distName, 'a/b/c') and substring-after(@distName, 'a/b/c') >= 0]

Breaking it up, we're first checking if the distName attribute starts with a/b/c . distName它,我们首先检查distName属性是否以a/b/c开头。 Then if everything after that string is some number. 然后,如果该字符串之后的所有内容都是某个数字。 Depending on your needs, this might just be enough. 根据您的需求,这可能就足够了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM