[英]How can I access the data in an XML node when using ElementTree

i am parsing the XML located at this link: 我正在解析位于此链接的XML:

XML File to Parse XML文件解析

I need to access the data inside the node and it seems like the program I have written is telling me that there is nothing inside the node. 我需要访问节点内部的数据,似乎我编写的程序告诉我节点内部没有任何内容。 Here is my code: 这是我的代码:

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml
url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

#the comment below is part of another approach to the solution
#both approaches are leading me into the same direction
#it appears as if the data inside the node is not being parsed/extracted
#counts = comments.findall('comments/comment/count')

for count in comments.findall('count'):
    print comments.find('count').text

When i print out the 'data' variable alone, i get the complete XML tree. 当我单独打印出“数据”变量时,我得到了完整的XML树。 However, when I try to access the data inside a particular node, the node comes back empty. 但是,当我尝试访问特定节点内的数据时,该节点恢复为空。

I also tried printing the following code to see what data I would get back: 我还尝试打印以下代码以查看将返回的数据:

for child in comments:
    print child.tag, child.attrib

the output i got was: 我得到的输出是:

note {} comments {} 注意{}评论{}

What am i doing wrong, and what am i missing? 我做错了什么,我想念什么?

one of the errors i get when trying a different looping strategy of accessing the node is this: 我尝试访问节点的不同循环策略时遇到的错误之一是:

Traceback (most recent call last):
  File "xmlextractor.py", line 16, in <module>
    print comments.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

Please help and thanks!!! 请帮助,谢谢!!!


Ive realized in looking through the etree docs for python that my approach has been trying to 'get' the node attributes instead of the contents of the nodes. 我已经在浏览etree文档中的python时意识到,我的方法一直在尝试“获取”节点属性而不是节点的内容。 I still havent found an answer but i am definitely closer!!! 我仍然没有找到答案,但是我一定更靠近!!!

2nd UPDATE: 第二次更新:

so i tried out this code: 所以我尝试了这段代码:

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml

url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

counts = comments.findall('comments/comment/count')

print len(counts)

for count in counts:
    print 'count', count.find('count').text

from above, when i run this code my: 从上面,当我运行此代码时,我:

print len(counts)

outputs that i have 50 nodes in my counts list, but i still get the same error: 我的计数列表中有50个节点的输出,但是我仍然遇到相同的错误:

Traceback (most recent call last):
  File "xmlextractor.py", line 18, in <module>
    print 'count', count.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

i dont understand why it says that there is no 'text' attribute when i am trying to access the contents of the node. 我不明白为什么当我尝试访问节点的内容时没有“文本”属性。

What am I doing wrong?? 我究竟做错了什么??

A few comments on your approaches: 关于您的方法的一些评论:

 for count in comments.findall('count'): print comments.find('count').text 

comments.findall('count') returns an empty list because comments does not have any immediate child elements with the name count . comments.findall('count')返回一个空列表,因为comments没有任何带有名称count直接子元素。

 for child in comments: print child.tag, child.attrib 

Iterates over the immediate child elements of your root node, which are called note . 遍历根节点的直接子元素(称为note

 # From update #2 for count in comments.findall('comments/comment/count'): print 'count', count.find('count').text 

Here, count is an Element object representing a count node which itself does not contain any count nodes. 在此, count是一个Element代表一个对象count ,其本身并不包含任何节点count的节点。 Thus, count.find('count') returns a NoneType object. 因此, count.find('count')返回一个NoneType对象。

If I understand correctly, your goal is to retrieve the text values of the count nodes. 如果我理解正确,那么您的目标是检索count节点的文本值。 Here are two ways this can be achieved: 这可以通过两种方法实现:

for count in comments.findall('comments/comment/count'):
    print count.text

for comment in comments.iter('comment'):
    print comment.find('count').text

