简体   繁体   中英

Targeting specific sub-elements when parsing XML with Python

I'm working on building a simple parser to handle a regular data feed at work. This post, XML to csv(-like) format , has been very helpful. I'm using a for loop like in the solution, to loop through all of the elements/subelements I need to target but I'm still a bit stuck.

For instance, my xml file is structured like so:

<root>
  <product>
    <identifier>12</identifier>
    <identifier>ab</identifier>
    <contributor>Alex</contributor>
    <contributor>Steve</contributor>
  </product>
<root>

I want to target only the second identifier, and only the first contributor. Any suggestions on how might I do that?

Cheers!

The other answer you pointed to has an example of how to turn all instances of a tag into a list. You could just loop through those and discard the ones you're not interested in.

However, there's a way to do this directly with XPath: the mini-language supports item indexes in brackets:

import xml.etree.ElementTree as etree
document = etree.parse(open("your.xml"))

secondIdentifier = document.find(".//product/identifier[2]")
firstContributor = document.find(".//product/contributor[1]")
print secondIdentifier, firstContributor

prints

'ab', 'Alex'

Note that in XPath, the first index is 1 , not 0 .

ElementTree's find and findall only support a subset of XPath, described here . Full XPath, described in brief on W3Schools and more fully in the W3C's normative document is available from lxml , a third-party package, but one that is widely available. With lxml, the example would look like this:

import lxml.etree as etree
document = etree.parse(open("your.xml"))

secondIdentifier = document.xpath(".//product/identifier[2]")[0]
firstContributor = document.xpath(".//product/contributor[1]")[0]
print secondIdentifier, firstContributor

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM