[英]Python lxml: how to get human-readable XPath for XML element?
I have a short XML document: 我有一个简短的XML文档:
<tag1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://example.com/2009/namespace">
<tag2>
<tag3/>
<tag3/>
</tag2>
</tag1>
A short Python program loads this XML file like this: 一个简短的Python程序加载这个XML文件,如下所示:
from lxml import etree
f = open( 'myxml.xml' )
tree = etree.parse(f)
MY_NAMESPACE = 'http://example.com/2009/namespace'
xpath = etree.XPath( '/f:tag1/f:tag2/f:tag3', namespaces = { 'f': MY_NAMESPACE } )
# get first element that matches xpath
elem = xpath(tree)[0]
# get xpath for an element
print tree.getpath(elem)
I am expecting to get a meaningful, human-readable xpath with this code, however, instead I get a string like /*/*/*[1]
. 我希望用这个代码获得一个有意义的,人类可读的xpath,但是,我得到一个像
/*/*/*[1]
这样的字符串。
Any idea what could be causing this and how I can diagnose this issue? 知道是什么导致了这个以及我如何诊断这个问题?
Note: Using Python 2.7.9 and lxml 2.3 注意:使用Python 2.7.9和lxml 2.3
It looks like getpath()
(underlying libxml2 call xmlGetNodePath
) produces positional expression xpath for namespaced documents. 看起来像
getpath()
(底层libxml2调用xmlGetNodePath
)为命名空间文档生成位置表达式xpath。 User mzjn in the comments section pointed out that since lxml v3.4.0 a function getelementpath()
produces a human-readable xpath with fully qualified tag names (using "Clark notation" ). 注释部分的用户mzjn指出,自lxml v3.4.0起,函数
getelementpath()
生成一个具有完全限定标记名称的人类可读xpath(使用“Clark表示法” )。 This function generates xpath by traversing the tree from the node up to the root instead of using libxml2 API call. 此函数通过遍历树从节点到根而不是使用libxml2 API调用来生成xpath。
Similarly, if lxml v3.4+ is not available one can write a tree traversal function of their own. 同样,如果lxml v3.4 +不可用,可以编写自己的树遍历函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.