简体   繁体   中英

How can I access the data in an XML node when using ElementTree

i am parsing the XML located at this link:

XML File to Parse

I need to access the data inside the node and it seems like the program I have written is telling me that there is nothing inside the node. Here is my code:

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml
url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

#the comment below is part of another approach to the solution
#both approaches are leading me into the same direction
#it appears as if the data inside the node is not being parsed/extracted
#counts = comments.findall('comments/comment/count')

for count in comments.findall('count'):
    print comments.find('count').text

When i print out the 'data' variable alone, i get the complete XML tree. However, when I try to access the data inside a particular node, the node comes back empty.

I also tried printing the following code to see what data I would get back:

for child in comments:
    print child.tag, child.attrib

the output i got was:

note {} comments {}

What am i doing wrong, and what am i missing?

one of the errors i get when trying a different looping strategy of accessing the node is this:

Traceback (most recent call last):
  File "xmlextractor.py", line 16, in <module>
    print comments.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

Please help and thanks!!!

UPDATE:

Ive realized in looking through the etree docs for python that my approach has been trying to 'get' the node attributes instead of the contents of the nodes. I still havent found an answer but i am definitely closer!!!

2nd UPDATE:

so i tried out this code:

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml

url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

counts = comments.findall('comments/comment/count')

print len(counts)

for count in counts:
    print 'count', count.find('count').text

from above, when i run this code my:

print len(counts)

outputs that i have 50 nodes in my counts list, but i still get the same error:

Traceback (most recent call last):
  File "xmlextractor.py", line 18, in <module>
    print 'count', count.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

i dont understand why it says that there is no 'text' attribute when i am trying to access the contents of the node.

What am I doing wrong??

A few comments on your approaches:

 for count in comments.findall('count'): print comments.find('count').text 

comments.findall('count') returns an empty list because comments does not have any immediate child elements with the name count .

 for child in comments: print child.tag, child.attrib 

Iterates over the immediate child elements of your root node, which are called note .

 # From update #2 for count in comments.findall('comments/comment/count'): print 'count', count.find('count').text 

Here, count is an Element object representing a count node which itself does not contain any count nodes. Thus, count.find('count') returns a NoneType object.

If I understand correctly, your goal is to retrieve the text values of the count nodes. Here are two ways this can be achieved:

for count in comments.findall('comments/comment/count'):
    print count.text

for comment in comments.iter('comment'):
    print comment.find('count').text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM