简体   繁体   中英

Find the XPath of a string in Python with LXML

I'm trying to develop a Python script in order to extract easily XPath of elements in a XML or HTML file.

For instance, Imagine we have the XML file below (test.xml) for which we would like to get the XPATH of "blue" :


I tried to use LXML, but I'm bit lost :

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

How can I get the XPath of the element in tree with text="blue"?

Thank you, Thomas

I'm not so sure this is a duplicate of the question which has been cited. That question, and answers, appear to be traversing the entire tree, visiting each text node, whereas I read this question as simply returning the xpath of a specific node given a criteria - in this case the nodes text() - without having to visit every node.

The first three lines given above are actually correct, you need only add one more to arrive at the simplest answer:

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

print(tree.getpath(doc.xpath('//*[contains(text(), "blue")]')[0]))

That gives us the result:

(env) [tlum@localhost python-environments]$ python test.py

Of course, if there is a possibility the criteria won't be found, or be found multiple times, we'd have a little more work to do, but I'll consider that beyond the scope of the question for now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM