[英]How do I pull data from XML document from between XML tags in Django/Python?
我在views.py
文件中加载了一个外部xml文件
def test(request):
url = urllib2.urlopen("http://someurl.com?xml")
dom = minidom.parse(url)
groups = dom.getElementsByTagName("group")
deal_holder = []
# Iterate over each DOM group element:
for group in groups:
# Iterate over each child node
for groupChild in group.childNodes:
deal_holder.append(groupChild)
return render_to_response('folder/test.html', {'deal_holder':deal_holder})
加载的XML文件如下所示:
<page>
<site>
<siteid>25550</siteid>
<sitename>
<![CDATA[ Some Text Here ]]>
</sitename>
<sitelink>
http://somelinkehere.com
</sitelink>
<timezone>
<![CDATA[ Pacific Time ]]>
</timezone>
</site>
<groups>
<enablefeaturedgroup>OFF</enablefeaturedgroup>
<group>
<groupid>467246</groupid>
<groupname>
<![CDATA[ Today's Deal ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
<group>
<groupid>467247</groupid>
<groupname>
<![CDATA[ Past Deals ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
</groups>
</page>
问题是我见过的所有示例都使用与我正在使用的类似的东西,除了它们通常具有如下所示的XML标记: <weather:forecast day="Wed" date="14 Sep 2011" low="56" high="72" text="AM Clouds/PM Sun" code="30"/>
,并且能够从诸如day="Wed"
, date="14 Sep 2011"
, low="56"
等...但是我要检索的信息实际上位于诸如<siteid>25550</siteid>
类的标记之间
任何建议或信息将不胜感激。
使用minidom与javascript非常相似。
from xml.dom import minidom
from StringIO import StringIO
a = """<page>
<site>
<siteid>25550</siteid>
<sitename>
<![CDATA[ Some Text Here ]]>
</sitename>
<sitelink>
http://somelinkehere.com
</sitelink>
<timezone>
<![CDATA[ Pacific Time ]]>
</timezone>
</site>
<groups>
<enablefeaturedgroup>OFF</enablefeaturedgroup>
<group>
<groupid>467246</groupid>
<groupname>
<![CDATA[ Today's Deal ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
<group>
<groupid>467247</groupid>
<groupname>
<![CDATA[ Past Deals ]]>
</groupname>
<groupdescription>
<![CDATA[ ]]>
</groupdescription>
</group>
</groups>
</page>
"""
tree = minidom.parse(StringIO(a))
groups = tree.getElementsByTagName("group")
如果您使用的是urllib,则不需要使用StringIO
,因为minidom
的parse
方法需要一个类似文件的对象( urllib.urlopen
仅返回该对象)。
我建议不要将此列表传递给django模板系统。 您应该进一步解析它。
# Iterate over each DOM group element:
group_dictionaries = []
for group in groups:
group_dict = {}
# Iterate over each child node
# instead of for loop maybe print groupChildNodes[0] for groupid
# print groupChildNodes[1] for groupname
for groupChild in group.ChildNodes:
# do something with each node
group_dict[groupChild.tagName] = groupChild.data
group_dictionaries.append(group_dict)
Now in the template:
{% for group in group_dictionaries %}
{{ group.groupid }}
{{ group.groupname }}
etc.
{% endfor %}
您可以将它们的值保存在词典列表中。
使用lxml
您可以执行以下操作:
import lxml.etree
tree = lxml.etree.parse("http://someurl.com")
sites = tree.xpath("//site")
for site in sites:
siteid = site.find("siteid").text
print siteid
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.