简体   繁体   English

解析XML:使用ElementTree查找有趣的元素

[英]Parsing XML: Finding Interesting Elements Using ElementTree

I am using urllib and ElementTree to parse XML API calls from pubmed. 我正在使用urllib和ElementTree来解析来自pubmed的XML API调用。

An example of this is: 例如:

#Imports Modules that can send requests to URLs 
#Python Version 3.4 Using IEP (Interactive Editor for Python) as IDE  
import urllib.request 
import urllib.parse 
import re 
import xml.etree.ElementTree as ET 
from urllib import request 

#Obtain API Call and assign Element Object to Root
id_request = urllib.request.urlopen('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056')
id_pubmed = id_request.read()
root = ET.fromstring(id_pubmed)

I now have been able to use Element Tree to import the data to the object root from ET.fromstring. 现在,我已经能够使用元素树将数据从ET.fromstring导入到对象根目录。 My issue now, is that I am having trouble finding interesting elements from this object. 我现在的问题是,我无法从该对象中找到有趣的元素。

I am referring to: https://docs.python.org/2/library/xml.etree.elementtree.html and my XML format looks like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=1757056 我指的是: https : //docs.python.org/2/library/xml.etree.elementtree.html ,我的XML格式如下: http : //eutils.ncbi.nlm.nih.gov/entrez/eutils /esummary.fcgi?db=pubmed&id=1757056

I have tried: 我努力了:

#Parse Attempts.  Nothing returned.
for author in root.iter('Author'):
   print (author.attrib)

As well as 以及

#No Return for author
for author in root.findall('Id'):
   author = author.find('author').text
   print (author)

Try to iterate by the tag 尝试按标签进行迭代

for author in root.iter('Item'):
    if author.attrib['Name'] == 'Author':
    print("Success") 

Or: 要么:

author_list = [x for x in root.iter('Item') if x.attrib['Name'] == 'Author']

I don't know if you can iterate by attribute 我不知道您是否可以按属性进行迭代

The .attrib method returns the value inside of a tag. .attrib方法返回标签内部的值。 I think you may want to use either .tag or .text instead. 我认为您可能想使用.tag.text代替。 I'm not exactly sure what data you are trying to pull from this tree, but you can also loop over the author value. 我不确定您要从此树中提取什么数据,但是您也可以遍历author值。

Edit: Well the esummaryResult tag seems pointless, unless you will have more DocSum tags. 编辑:好吧,除非您将有更多的DocSum标签,否则esummaryResult标签似乎毫无意义。 But the information you want is in your .text value. 但是您想要的信息在您的.text值中。 Try printing author.tag and maybe you can check the values returned for what you currently are iterating over. 尝试打印author.tag ,也许您可​​以检查返回的值以进行当前迭代。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM