简体   繁体   中英

Finding specific XML attribute of child element using Python?

<root>
  <article>
    <front>
      <body>
        <back>
          <sec id="sec7" sec-type="funding">
            <title>Funding</title>
            <p>This work was supported by the NIH</p>
          </sec>
        </back>

I have an XML file of scientific journal metadata and am trying to extract just the funding information for each article. I need the info contained within the p tag. While the "sec id" varies between article, the "sec-type" is always "funding".

I have been trying to do this in Python3 using Element Tree.

import xml.etree.ElementTree as ET  

tree = ET.parse(journals.xml)
root = tree.getroot()
for title in root.iter("title"):
    ET.dump(title)

Any help would be greatly appreciated!

You can use findall with an XPath expression to extract the values you want. I extrapolated from your example data a little bit in order to complete the document and have two p elements:

<root>
  <article>
    <front>
      <body>
        <back>
          <sec id="sec7" sec-type="funding">
            <title>Funding</title>
            <p>This work was supported by the NIH</p>
          </sec>
          <sec id="sec8" sec-type="funding">
            <title>Funding</title>
            <p>I'm a little teapot</p>
          </sec>
        </back>
      </body>
    </front>
  </article>
</root>

The following extracts all of the text contents of p nodes under a sec node where sectype="funding" :

import xml.etree.ElementTree as ET

doc = ET.parse('journals.xml')
print([p.text for p in doc.findall('.//sec[@sec-type="funding"]/p')])

Result:

['This work was supported by the NIH', "I'm a little teapot"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM