Python XPath SyntaxError：无效谓词

Question

i am trying to parse an xml like 我试图解析像xml一样的

<document>
    <pages>

    <page>   
       <paragraph>XBV</paragraph>

       <paragraph>GHF</paragraph>
    </page>

    <page>
       <paragraph>ash</paragraph>

       <paragraph>lplp</paragraph>
    </page>

    </pages>
</document>

and here is my code 这是我的代码

import xml.etree.ElementTree as ET

tree = ET.parse("../../xml/test.xml")

root = tree.getroot()

path="./pages/page/paragraph[text()='GHF']"

print root.findall(path)

but i get an error 但我得到一个错误

print root.findall(path)
  File "X:\Anaconda2\lib\xml\etree\ElementTree.py", line 390, in findall
    return ElementPath.findall(self, path, namespaces)
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 293, in findall
    return list(iterfind(elem, path, namespaces))
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 263, in iterfind
    selector.append(ops[token[0]](next, token))
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 224, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

what is wrong with my xpath? 我的xpath有什么问题？

Follow up 跟进

Thanks falsetru, your solution worked. 谢谢你，你的解决方案有效。 I have a follow up. 我有一个跟进。 Now, i want to get all the paragraph elements that come before the paragraph with text GHF . 现在，我希望获得带有文本GHF的段落之前的所有段落元素。 So in this case i only need the XBV element. 所以在这种情况下我只需要XBV元素。 I want to ignore the ash and lplp . 我想忽略ash和lplp 。 i guess one way to do this would be 我想有一种方法可以做到这一点

result = []
for para in root.findall('./pages/page/'):
    t = para.text.encode("utf-8", "ignore")
    if t == "GHF":
       break
    else:
        result.append(para)

but is there a better way to do this? 但是有更好的方法吗？

Answer 1

ElementTree's XPath support is limited. ElementTree的XPath支持有限。 Use other library like lxml : 使用像lxml这样的其他库：

import lxml.etree
root = lxml.etree.parse('test.xml')

path="./pages/page/paragraph[text()='GHF']"
print root.xpath(path)

Answer 2

As @falsetru mentioned, ElementTree doesn't support text() predicate, but it supports matching child element by text, so in this example, it is possible to search for a page that has a paragraph with specific text, using the path ./pages/page[paragraph='GHF'] . 正如@falsetru所提到的， ElementTree不支持text()谓词，但它支持按文本匹配子元素，因此在此示例中，可以使用路径搜索具有特定文本paragraph的page ./pages/page[paragraph='GHF'] 。 The problem here is that there are multiple paragraph tags in a page , so one would have to iterate for the specific paragraph . 这里的问题是，有多个paragraph的标签page ，所以一个人必须要重复的具体paragraph 。 In my case, I needed to find the version of a dependency in a maven pom.xml, and there is only a single version child so the following worked: 在我的情况下，我需要在maven pom.xml中找到dependency的version ，并且只有一个version子项，所以以下工作：

In [1]: import xml.etree.ElementTree as ET

In [2] ns = {"pom": "http://maven.apache.org/POM/4.0.0"}

In [3] print ET.parse("pom.xml").findall(".//pom:dependencies/pom:dependency[pom:artifactId='some-artifact-with-hardcoded-version']/pom:version", ns)[0].text
Out[1]: '1.2.3'

Python XPath SyntaxError：无效谓词

问题描述

2 个解决方案

解决方案1
10 已采纳 2015-11-20 15:59:13

解决方案2
2

Python XPath SyntaxError：无效谓词

问题描述

2 个解决方案

解决方案1 10 已采纳 2015-11-20 15:59:13

解决方案2 2

解决方案1
10 已采纳 2015-11-20 15:59:13

解决方案2
2