简体   繁体   中英

Find the XPath of a string in Python with LXML

I'm trying to develop a Python script in order to extract easily XPath of elements in a XML or HTML file.

For instance, Imagine we have the XML file below (test.xml) for which we would like to get the XPATH of "blue" :

<root>
  <element>
    <name>Element1</name>
    <contains>
      <element>
        <name>color</name>
        <value-ref>/Colors/red</value-ref>
      </element>
    </contains>
  </element>
  <element>
    <name>Colors</name>
    <contains>
      <element>
        <name>red</name>
        <value>0xFF0000</value>
      </element>
      <element>
        <name>blue</name>
        <value>0x0000FF</value>
      </element>
    </contains>
  </element>
</root>

I tried to use LXML, but I'm bit lost :

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

How can I get the XPath of the element in tree with text="blue"?

Thank you, Thomas

I'm not so sure this is a duplicate of the question which has been cited. That question, and answers, appear to be traversing the entire tree, visiting each text node, whereas I read this question as simply returning the xpath of a specific node given a criteria - in this case the nodes text() - without having to visit every node.

The first three lines given above are actually correct, you need only add one more to arrive at the simplest answer:

from lxml import etree
doc = etree.parse('test.xml')
tree = etree.ElementTree(doc.getroot())

print(tree.getpath(doc.xpath('//*[contains(text(), "blue")]')[0]))

That gives us the result:

(env) [tlum@localhost python-environments]$ python test.py
/root/element[2]/contains/element[2]/name

Of course, if there is a possibility the criteria won't be found, or be found multiple times, we'd have a little more work to do, but I'll consider that beyond the scope of the question for now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM