Python lxml：如何为XML元素获取人类可读的XPath？

Question

I have a short XML document: 我有一个简短的XML文档：

<tag1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns="http://example.com/2009/namespace">
    <tag2>
        <tag3/>
        <tag3/>
    </tag2>
</tag1>

A short Python program loads this XML file like this: 一个简短的Python程序加载这个XML文件，如下所示：

from lxml import etree

f = open( 'myxml.xml' )
tree = etree.parse(f)
MY_NAMESPACE = 'http://example.com/2009/namespace'
xpath = etree.XPath( '/f:tag1/f:tag2/f:tag3', namespaces = { 'f': MY_NAMESPACE } )
# get first element that matches xpath
elem = xpath(tree)[0]
# get xpath for an element 
print tree.getpath(elem)

I am expecting to get a meaningful, human-readable xpath with this code, however, instead I get a string like /*/*/*[1] . 我希望用这个代码获得一个有意义的，人类可读的xpath，但是，我得到一个像/*/*/*[1]这样的字符串。

Any idea what could be causing this and how I can diagnose this issue? 知道是什么导致了这个以及我如何诊断这个问题？

Note: Using Python 2.7.9 and lxml 2.3 注意：使用Python 2.7.9和lxml 2.3

Answer 1

It looks like getpath() (underlying libxml2 call xmlGetNodePath ) produces positional expression xpath for namespaced documents. 看起来像getpath() （底层libxml2调用xmlGetNodePath ）为命名空间文档生成位置表达式xpath。 User mzjn in the comments section pointed out that since lxml v3.4.0 a function getelementpath() produces a human-readable xpath with fully qualified tag names (using "Clark notation" ). 注释部分的用户mzjn指出，自lxml v3.4.0起，函数getelementpath()生成一个具有完全限定标记名称的人类可读xpath（使用“Clark表示法” ）。 This function generates xpath by traversing the tree from the node up to the root instead of using libxml2 API call. 此函数通过遍历树从节点到根而不是使用libxml2 API调用来生成xpath。

Similarly, if lxml v3.4+ is not available one can write a tree traversal function of their own. 同样，如果lxml v3.4 +不可用，可以编写自己的树遍历函数。

Python lxml：如何为XML元素获取人类可读的XPath？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-08-19 11:13:35

Python lxml：如何为XML元素获取人类可读的XPath？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-08-19 11:13:35

解决方案1
2 已采纳 2015-08-19 11:13:35