I'm working on building a simple parser to handle a regular data feed at work. This post, XML to csv(-like) format , has been very helpful. I'm using a for loop like in the solution, to loop through all of the elements/subelements I need to target but I'm still a bit stuck.
For instance, my xml file is structured like so:
<root>
<product>
<identifier>12</identifier>
<identifier>ab</identifier>
<contributor>Alex</contributor>
<contributor>Steve</contributor>
</product>
<root>
I want to target only the second identifier, and only the first contributor. Any suggestions on how might I do that?
Cheers!
The other answer you pointed to has an example of how to turn all instances of a tag into a list. You could just loop through those and discard the ones you're not interested in.
However, there's a way to do this directly with XPath: the mini-language supports item indexes in brackets:
import xml.etree.ElementTree as etree
document = etree.parse(open("your.xml"))
secondIdentifier = document.find(".//product/identifier[2]")
firstContributor = document.find(".//product/contributor[1]")
print secondIdentifier, firstContributor
prints
'ab', 'Alex'
Note that in XPath, the first index is 1
, not 0
.
ElementTree's find
and findall
only support a subset of XPath, described here . Full XPath, described in brief on W3Schools and more fully in the W3C's normative document is available from lxml , a third-party package, but one that is widely available. With lxml, the example would look like this:
import lxml.etree as etree
document = etree.parse(open("your.xml"))
secondIdentifier = document.xpath(".//product/identifier[2]")[0]
firstContributor = document.xpath(".//product/contributor[1]")[0]
print secondIdentifier, firstContributor
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.