I'm trying to develop a Python script in order to extract easily XPath of elements in a XML or HTML file.
For instance, Imagine we have the XML file below (test.xml) for which we would like to get the XPATH of "blue" :
<root>
<element>
<name>Element1</name>
<contains>
<element>
<name>color</name>
<value-ref>/Colors/red</value-ref>
</element>
</contains>
</element>
<element>
<name>Colors</name>
<contains>
<element>
<name>red</name>
<value>0xFF0000</value>
</element>
<element>
<name>blue</name>
<value>0x0000FF</value>
</element>
</contains>
</element>
</root>
I tried to use LXML, but I'm bit lost :
from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())
How can I get the XPath of the element in tree with text="blue"?
Thank you, Thomas
I'm not so sure this is a duplicate of the question which has been cited. That question, and answers, appear to be traversing the entire tree, visiting each text node, whereas I read this question as simply returning the xpath of a specific node given a criteria - in this case the nodes text()
- without having to visit every node.
The first three lines given above are actually correct, you need only add one more to arrive at the simplest answer:
from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())
print(tree.getpath(doc.xpath('//*[contains(text(), "blue")]')[0]))
That gives us the result:
(env) [tlum@localhost python-environments]$ python test.py
/root/element[2]/contains/element[2]/name
Of course, if there is a possibility the criteria won't be found, or be found multiple times, we'd have a little more work to do, but I'll consider that beyond the scope of the question for now.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.