简体   繁体   中英

Python: XML parsing using xml.dom.minidom - iterating over collection.getElementsByTagName

I have some xml I'm trying to parse using Python 2.7. The XML is in the format below. In the code (also included below), if I try to print using collection.getAttribute('ProjectId') I get the id number, but once I assign collection to tags and then run it through a loop, I don't get output(no error message). Any clues?

XML:

<SummaryReport ProjectId="37f8d135-1f1d-4e57-9b7d-b084770c6bf5" EntityId="016fbc07-69f0-407e-b5b5-0b0b6bba4307" Status="Failed">
  <TotalCount>0</TotalCount>
  <SuccessfulCount>0</SuccessfulCount>
  <FailedCount>0</FailedCount>
  <StartUtcTime>2015-09-09T16:43:11.810715Z</StartUtcTime>
  <EndUtcTime>2015-09-09T16:43:44.5418427Z</EndUtcTime>
  <IsIncremental>false</IsIncremental>
  <OnDemand>true</OnDemand>
  <TrackingId>c0972936-c8b6-4cdb-b089-d08c6f9702aa</TrackingId>
  <Message>An error occurred during content building: Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index</Message>
  <LogEntries>
    <LogEntry>
      <Level>Info</Level>
      <LogCodeType>System</LogCodeType>
      <LogCode>PhaseSucceedInfo</LogCode>
      <Name>Phase</Name>
      <Message>'Load Metadata' succeeded in 00:00:00.1905878 seconds.</Message>
      <Anchor>Info_7333babe-fc51-4b45-9167-bf263e7babcb</Anchor>
    </LogEntry>
    <LogEntry>
     <Level>Info</Level>
     <LogCodeType>System</LogCodeType>
     <LogCode>PublishRequest</LogCode>
     <Name>PublishTocAndArticleInit</Name>
     <Message>'Load Metadata' succeeded in 00:00:01.1905878 seconds.</Message>
     <Anchor>Info_51c10e71-d99a-49f9-b4aa-d83dc273426a</Anchor>
     </LogEntry>
  </LogEntries> 
</SummaryReport>

Code:

#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("file.xml")
collection = DOMTree.documentElement

#Get all the tags under summaryreport
tags = collection.getElementsByTagName("SummaryReport")


#print tag info
for tag in tags:
    print '*******Tag Info************'

    print 'Project Id: %s' % tag.getAttribute('ProjectId')

You get no output because collection.getElementsByTagName("SummaryReport") returns nothing :

>>> tags = collection.getElementsByTagName("SummaryReport")
>>> print(tags)
[]

That make sense since collection already reference SummaryReport element and it has no descendant element named the same.

UPDATE :

Simple for loop works fine to iterate through Level elements and print the value, for example :

>>> tags = collection.getElementsByTagName("Level")
>>> for tag in tags:
        print(tag.firstChild.nodeValue)
Info
Info

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM