简体   繁体   English

python minidom在结果中使用冒号解析XML feed

[英]python minidom parse XML feed with colon in results

I am trying to parse an XML feed that displays NFL schedules, one of the attributes is the game time and it looks like this GameTime="8:30 PM" 我正在尝试解析显示NFL时间表的XML提要,其中一个属性是比赛时间,看起来像是GameTime =“ 8:30 PM”

Here's a clip of what the XML looks like 这是XML外观的片段

<Schedule Season="2010" Timezone="Eastern">
  <Game gameId="1" Week="1" GameDate="2010-09-09" AwayTeam="MIN" HomeTeam="NO" GameTime="8:30 PM"/>
  <Game gameId="2" Week="1" GameDate="2010-09-12" AwayTeam="MIA" HomeTeam="BUF" GameTime="1:00 PM"/>
  <Game gameId="3" Week="1" GameDate="2010-09-12" AwayTeam="DET" HomeTeam="CHI" GameTime="1:00 PM"/>
  <Game gameId="4" Week="1" GameDate="2010-09-12" AwayTeam="OAK" HomeTeam="TEN" GameTime="1:00 PM"/>
</Schedule>

Here's my code to read it 这是我的阅读代码

url="http://example.com/schedule.xml"
dom = minidom.parse(urllib2.urlopen(url))

for node in dom.getElementsByTagName('Game'):
    print node.getAttribute('AwayTeam'),
    print node.getAttribute('HomeTeam'),
    print node.getAttribute('Week'),
    print node.getAttribute('gameId'),
    print node.getAttribute('GameDate'),
    print node.getAttribute('GameTime')

It prints what I'd expect until I add that last line. 在我添加最后一行之前,它会打印出我期望的样子。 ETA: Once this last line is added, it goes from printing lines from the XML to nothing. ETA:一旦添加了最后一行,它就从打印XML的行变成什么都没有。

 print node.getAttribute('GameTime')

I'd assume it is because there is a colon in the returned data, but I can't find anything to assist me with either escaping that to allow it to print or ignoring it. 我以为是因为返回的数据中有一个冒号,但是我找不到任何能帮助我转义以使其打印或忽略的内容。

Any help would be most appreciated. 非常感激任何的帮助。

I tried to reproduce your error, but it printed just fine for me. 我试图重现您的错误,但对我来说打印得很好。

I tried loading your data from a string within the Python module, I tried loading your data from a named file, and I tried loading your data from a file object. 我尝试从Python模块中的字符串加载数据,尝试从命名文件加载数据,并尝试从文件对象加载数据。 They all handled the colon just fine. 他们都很好地处理了结肠。

The only difference now seems to be the way you get the data: urllib2.openurl(). 现在唯一的区别似乎是你得到的数据的方式:urllib2.openurl()。 Perhaps the data returned by that function is not exactly what minidom expects. 也许该函数返回的数据并非minimini所期望的。 Or perhaps it does something with the colon character. 或者,它可以对冒号字符起到某些作用。

Here is the code I used (The test.xml file contains the same xml data as in the triple-quoted string): 这是我使用的代码(test.xml文件包含与三引号字符串相同的xml数据):

from xml.dom import minidom

src = """
<Schedule Season="2010" Timezone="Eastern">
  <Game gameId="1" Week="1" GameDate="2010-09-09" AwayTeam="MIN" HomeTeam="NO" GameTime="8:30 PM"/>
  <Game gameId="2" Week="1" GameDate="2010-09-12" AwayTeam="MIA" HomeTeam="BUF" GameTime="1:00 PM"/>
  <Game gameId="3" Week="1" GameDate="2010-09-12" AwayTeam="DET" HomeTeam="CHI" GameTime="1:00 PM"/>
  <Game gameId="4" Week="1" GameDate="2010-09-12" AwayTeam="OAK" HomeTeam="TEN" GameTime="1:00 PM"/>
</Schedule>
"""

def test_print(dom):
    for node in dom.getElementsByTagName('Game'):
        print node.getAttribute('AwayTeam'),
        print node.getAttribute('HomeTeam'),
        print node.getAttribute('Week'),
        print node.getAttribute('gameId'),
        print node.getAttribute('GameDate'),
        print node.getAttribute('GameTime')
    print ''

dom = minidom.parseString(src)
test_print(dom)

dom = minidom.parse('data.xml')
test_print(dom)

f = open('data.xml', 'r')
dom = minidom.parse(f)
test_print(dom)
f.close()

url = 'http://api.fantasyfootballnerd.com/ffnScheduleXML.php?apiKey=1'
dom = minidom.parse(urllib2.urlopen(url))
test_print(dom)

Edits: Added test for URL provided by Mike (original post author). 编辑:添加了对Mike(原始帖子作者)提供的URL的测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM