Find the XPath of a string in Python with LXML

Question

I'm trying to develop a Python script in order to extract easily XPath of elements in a XML or HTML file.

For instance, Imagine we have the XML file below (test.xml) for which we would like to get the XPATH of "blue" :

<root>
  <element>
    <name>Element1</name>
    <contains>
      <element>
        <name>color</name>
        <value-ref>/Colors/red</value-ref>
      </element>
    </contains>
  </element>
  <element>
    <name>Colors</name>
    <contains>
      <element>
        <name>red</name>
        <value>0xFF0000</value>
      </element>
      <element>
        <name>blue</name>
        <value>0x0000FF</value>
      </element>
    </contains>
  </element>
</root>

I tried to use LXML, but I'm bit lost :

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

How can I get the XPath of the element in tree with text="blue"?

Thank you, Thomas

Answer 1

I'm not so sure this is a duplicate of the question which has been cited. That question, and answers, appear to be traversing the entire tree, visiting each text node, whereas I read this question as simply returning the xpath of a specific node given a criteria - in this case the nodes text() - without having to visit every node.

The first three lines given above are actually correct, you need only add one more to arrive at the simplest answer:

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

print(tree.getpath(doc.xpath('//*[contains(text(), "blue")]')[0]))

That gives us the result:

(env) [tlum@localhost python-environments]$ python test.py
/root/element[2]/contains/element[2]/name

Of course, if there is a possibility the criteria won't be found, or be found multiple times, we'd have a little more work to do, but I'll consider that beyond the scope of the question for now.

Find the XPath of a string in Python with LXML

Question

1 answers

solution1
0 2020-12-23 00:37:17

Find the XPath of a string in Python with LXML

Question

1 answers

solution1 0 2020-12-23 00:37:17

solution1
0 2020-12-23 00:37:17