简体   繁体   中英

Parsing XML: Finding Interesting Elements Using ElementTree

I am using urllib and ElementTree to parse XML API calls from pubmed.

An example of this is:

#Imports Modules that can send requests to URLs 
#Python Version 3.4 Using IEP (Interactive Editor for Python) as IDE  
import urllib.request 
import urllib.parse 
import re 
import xml.etree.ElementTree as ET 
from urllib import request 

#Obtain API Call and assign Element Object to Root
id_request = urllib.request.urlopen('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056')
id_pubmed = id_request.read()
root = ET.fromstring(id_pubmed)

I now have been able to use Element Tree to import the data to the object root from ET.fromstring. My issue now, is that I am having trouble finding interesting elements from this object.

I am referring to: https://docs.python.org/2/library/xml.etree.elementtree.html and my XML format looks like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056

I have tried:

#Parse Attempts.  Nothing returned.
for author in root.iter('Author'):
   print (author.attrib)

As well as

#No Return for author
for author in root.findall('Id'):
   author = author.find('author').text
   print (author)

Try to iterate by the tag

for author in root.iter('Item'):
    if author.attrib['Name'] == 'Author':
    print("Success") 

Or:

author_list = [x for x in root.iter('Item') if x.attrib['Name'] == 'Author']

I don't know if you can iterate by attribute

The .attrib method returns the value inside of a tag. I think you may want to use either .tag or .text instead. I'm not exactly sure what data you are trying to pull from this tree, but you can also loop over the author value.

Edit: Well the esummaryResult tag seems pointless, unless you will have more DocSum tags. But the information you want is in your .text value. Try printing author.tag and maybe you can check the values returned for what you currently are iterating over.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM