简体   繁体   中英

python lxml get parent element when you know child text with xpath

I have the following xml file: test.xml

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.someaddress.com/someendpoint">
      <objTransaction>
        <DataFields>
          <TxnField>
            <FieldName>Pickup.Address.CountryCode</FieldName>
            <FieldValue>DE</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.Address.PostalCode</FieldName>
            <FieldValue>10827</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.DateTime</FieldName>
            <FieldValue>2016-05-28T03:26:05</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.LocationTypeCode</FieldName>
            <FieldValue>O</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
          <TxnField>
            <FieldName>Pickup.Address.City</FieldName>
            <FieldValue>Berlin</FieldValue>
            <FieldIndex>0</FieldIndex>
          </TxnField>
        </DataFields>
      </objTransaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

What I want to do is to get an element with tag TxnField that has a child FieldName with text Pickup.DateTime . It is important to get the parent element, so I need to get this:

<TxnField>
  <FieldName>Pickup.DateTime</FieldName>
  <FieldValue>2016-05-28T03:26:05</FieldValue>
  <FieldIndex>0</FieldIndex>
</TxnField>

What I have so far is the following:

from lxml import etree
xml_parser = etree.XMLParser(remove_blank_text=True)
xml_tree = etree.parse('test.xml', xml_parser)

p_time = xml_tree.xpath("//*[local-name()='TxnField']/*[text()='Pickup.DateTime']")
print(p_time[0].tag) # {http://http://www.someaddress.com/someendpoint}FieldName

But this gives me the actual element with text Pickup.DateTime and I am interested in getting its parent as shown above.

As a side note : it took me almost an hour even to get this far because I find the lxml documentation to be very cumbersome. If anyone has a link with a good tutorial please post it at least as a comment. Thanks!

我已经找到了如何获得它:

p_time = xml_tree.xpath("//*[local-name()='TxnField']/*[text()='Pickup.DateTime']/./..")

Here is a suggestion:

from lxml import etree

NSMAP = {"s": "http://www.someaddress.com/someendpoint"}

xml_parser = etree.XMLParser(remove_blank_text=True)
xml_tree = etree.parse('test.xml', xml_parser)

p_time = xml_tree.xpath("//s:FieldName[.='Pickup.DateTime']", namespaces=NSMAP)[0]
parent = p_time.getparent()
  • The s prefix is declared to be bound to the http://www.someaddress.com/someendpoint namespace. It is used in the XPath expression instead of local-name() .
  • The call to xpath() returns a list with one item (the wanted FieldName element) and then the getparent() method is used to find its parent.

There is more than one way to do it!

Btw, I think this is a pretty good lxml tutorial: http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/index.html

Preferable XPath expression, IMHO, to get parent node having child node with certain value is something like this :

//d:TxnField[d:FieldName='Pickup.DateTime']

the above assumed that you have mapped prefix d to the default namespace uri. But from your comments, seems like you prefer to ignore namespaces here, so this is the equivalent expression without having to have registered namespace prefix :

//*[local-name()='TxnField'][*[local-name()='FieldName' and .='Pickup.DateTime']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM