简体   繁体   中英

Get etree Element with attribute, or containing subelement with attribute

I have an XML file to parse, and I need to find elements by id.

In the example code, I need to find the name of the driver , but I don't know whether my id is for the vehicle , engine , or block . I would like a solution which would work with arbitrary xml inside of vehicle (but existence of driver is guaranteed).

<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>

What have I tried

I was trying to get the elements by their id, and then, if they weren't vehicle tags, navigate up the tree to find it, but it seems python's elem.find() returns None if the result is outside elem .

Looking at the docs , they have this example:

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

But I don't see how to make that work for any descendant, as opposed to a decendant on a specific level.

Note : All the snippets below use lxml library. To install, run: pip install lxml .

You should use root.xpath(..) not root.findall(..) .

>>> root.xpath("//vehicle/driver/text()")
['Bob Johnson', 'Dave Edwards']

If you want to extract driver's name from a given ID, you'd do:

>>> vehicle_id = "16"
>>> xpath("//vehicle[@id='16' or .//*[@id='16']]/driver/text()")
['Bob Johnson']

UPDATE: To get the driver's name for a given id nested at any level deeper, you'd do:

>>> i = '16'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '532'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '113'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']

If you know the id , but don't know if this id is from vehicle, engine or block, you can approach it with an XPath expression, but you would have to use lxml.etree instead of xml.etree.ElementTree (it has very limited XPath support). Use the ancestor-or-self axis:

input_id = "your ID"
print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

This would print:

  • Bob Johnson if input_id would be 16 or 532 or 113
  • Dave Edwards if input_id would be 452 or 212 or 381

Complete working example:

import lxml.etree as ET

data = """
<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>
"""

root = ET.fromstring(data)
for input_id in [16, 532, 113, 452, 212, 381]:
    print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

Prints:

Bob Johnson
Bob Johnson
Bob Johnson
Dave Edwards
Dave Edwards
Dave Edwards

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM