[英]python minidom parse XML feed with colon in results
I am trying to parse an XML feed that displays NFL schedules, one of the attributes is the game time and it looks like this GameTime="8:30 PM" 我正在尝试解析显示NFL时间表的XML提要,其中一个属性是比赛时间,看起来像是GameTime =“ 8:30 PM”
Here's a clip of what the XML looks like 这是XML外观的片段
<Schedule Season="2010" Timezone="Eastern">
<Game gameId="1" Week="1" GameDate="2010-09-09" AwayTeam="MIN" HomeTeam="NO" GameTime="8:30 PM"/>
<Game gameId="2" Week="1" GameDate="2010-09-12" AwayTeam="MIA" HomeTeam="BUF" GameTime="1:00 PM"/>
<Game gameId="3" Week="1" GameDate="2010-09-12" AwayTeam="DET" HomeTeam="CHI" GameTime="1:00 PM"/>
<Game gameId="4" Week="1" GameDate="2010-09-12" AwayTeam="OAK" HomeTeam="TEN" GameTime="1:00 PM"/>
</Schedule>
Here's my code to read it 这是我的阅读代码
url="http://example.com/schedule.xml"
dom = minidom.parse(urllib2.urlopen(url))
for node in dom.getElementsByTagName('Game'):
print node.getAttribute('AwayTeam'),
print node.getAttribute('HomeTeam'),
print node.getAttribute('Week'),
print node.getAttribute('gameId'),
print node.getAttribute('GameDate'),
print node.getAttribute('GameTime')
It prints what I'd expect until I add that last line. 在我添加最后一行之前,它会打印出我期望的样子。 ETA: Once this last line is added, it goes from printing lines from the XML to nothing. ETA:一旦添加了最后一行,它就从打印XML的行变成什么都没有。
print node.getAttribute('GameTime')
I'd assume it is because there is a colon in the returned data, but I can't find anything to assist me with either escaping that to allow it to print or ignoring it. 我以为是因为返回的数据中有一个冒号,但是我找不到任何能帮助我转义以使其打印或忽略的内容。
Any help would be most appreciated. 非常感激任何的帮助。
I tried to reproduce your error, but it printed just fine for me. 我试图重现您的错误,但对我来说打印得很好。
I tried loading your data from a string within the Python module, I tried loading your data from a named file, and I tried loading your data from a file object. 我尝试从Python模块中的字符串加载数据,尝试从命名文件加载数据,并尝试从文件对象加载数据。 They all handled the colon just fine. 他们都很好地处理了结肠。
The only difference now seems to be the way you get the data: urllib2.openurl(). 现在唯一的区别似乎是你得到的数据的方式:urllib2.openurl()。 Perhaps the data returned by that function is not exactly what minidom expects. 也许该函数返回的数据并非minimini所期望的。 Or perhaps it does something with the colon character. 或者,它可以对冒号字符起到某些作用。
Here is the code I used (The test.xml file contains the same xml data as in the triple-quoted string): 这是我使用的代码(test.xml文件包含与三引号字符串相同的xml数据):
from xml.dom import minidom
src = """
<Schedule Season="2010" Timezone="Eastern">
<Game gameId="1" Week="1" GameDate="2010-09-09" AwayTeam="MIN" HomeTeam="NO" GameTime="8:30 PM"/>
<Game gameId="2" Week="1" GameDate="2010-09-12" AwayTeam="MIA" HomeTeam="BUF" GameTime="1:00 PM"/>
<Game gameId="3" Week="1" GameDate="2010-09-12" AwayTeam="DET" HomeTeam="CHI" GameTime="1:00 PM"/>
<Game gameId="4" Week="1" GameDate="2010-09-12" AwayTeam="OAK" HomeTeam="TEN" GameTime="1:00 PM"/>
</Schedule>
"""
def test_print(dom):
for node in dom.getElementsByTagName('Game'):
print node.getAttribute('AwayTeam'),
print node.getAttribute('HomeTeam'),
print node.getAttribute('Week'),
print node.getAttribute('gameId'),
print node.getAttribute('GameDate'),
print node.getAttribute('GameTime')
print ''
dom = minidom.parseString(src)
test_print(dom)
dom = minidom.parse('data.xml')
test_print(dom)
f = open('data.xml', 'r')
dom = minidom.parse(f)
test_print(dom)
f.close()
url = 'http://api.fantasyfootballnerd.com/ffnScheduleXML.php?apiKey=1'
dom = minidom.parse(urllib2.urlopen(url))
test_print(dom)
Edits: Added test for URL provided by Mike (original post author). 编辑:添加了对Mike(原始帖子作者)提供的URL的测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.