[英]Parsing XML with Python from URL using minidom
I have a problem during parsing the following xml from a url. 我从网址解析以下xml时遇到问题。
sample XML in my url path: 我的网址路径中的示例XML:
<?xml version="1.0" encoding="utf-8"?>
<Documents>
<class>
<mid name="yyyyyyyyyyyyy"></mid>
<person name="yyyyyyyyyy"></person>
<url name="yyyyyyyyy"></url>
</class>
<class>
<mid name="xxxxx"></mid>
<person name="xxxxxxxxxx"></person>
<url name="xxxxxxxxxxx"></url>
</class>
</Documents>
Below is my python code; 下面是我的python代码;
def staff_list(request):
url = http://path.to.url/
dom = minidom.parse(urlopen(url))
person = dom.getElementsByTagName('person')
for i in person:
print i.attributes['name'].value
within forloop I want to print the person and url tag values in xml that belongs to same parent class. 在forloop中,我想在属于同一父类的xml中打印person和url标记值。
I tried the following method with iteration but get the "too many values to unpack" ERROR 我尝试了以下方法进行迭代,但得到“太多值无法解包”错误
def staff_list(request):
url = http://path.to.url/
dom = minidom.parse(urlopen(url))
person = dom.getElementsByTagName('person')
mid = dom.getElementsByTagName('mid')
url = dom.getElementsByTagName('url')
for i,j,k in person,mid,url:
print i.attributes['name'].value,j.attributes['name'].value,k.attributes['name'].value
Any suggestions ? 有什么建议么 ?
You want to use zip()
to combine the elements, I think: 我想使用zip()
组合元素,我认为:
for i,j,k in zip(person, mid, url):
Do yourself a big favour though and use the ElementTree API instead; 不过,请帮自己一个大忙,改用ElementTree API ; that API is far pythononic and easier to use than the XML DOM API. 与XML DOM API相比,该API具有很强的Python风格并且易于使用。
If you want to stick with minidom
you can change your loop to: 如果您想坚持minidom
,可以将循环更改为:
for cls in dom.getElementsByTagName('class'):
person = cls.getElementsByTagName('person')[0]
mid = cls.getElementsByTagName('mid')[0]
url = cls.getElementsByTagName('url')[0]
print person.attributes['name'].value
print mid.attributes['name'].value
print url.attributes['name'].value
As @Martijn Pieters said, have a look at ElementTree as an alternative API. 正如@Martijn Pieters所说的,看看ElementTree作为替代API。 For example: 例如:
import xml.etree.ElementTree as ET
documents = ET.fromstring(xmlstr)
for cls in documents.iter('class'):
person = cls.find('person')
mid = cls.find('mid')
url = cls.find('url')
print person.get('name'), mid.get('name'), url.get('name')
I would use xpath and lxml.html: A minimalist approach: 我将使用xpath和lxml.html:极简方法:
import lxml.html as lh
doc=lh.parse(test.xml)
In [70]: persons = doc.xpath('.//person/@name')
In [71]: urls=doc.xpath('.//person[@name]/following-sibling::url/@name')
In [72]: mids=doc.xpath('.//person[@name]/preceding-sibling::mid/@name')
In [73]: [[p,m,u]for p,m,u in zip(persons, mids, urls)]
Out[73]:
[['yyyyyyyyyy', 'yyyyyyyyyyyyy', 'yyyyyyyyy'],
['xxxxxxxxxx', 'xxxxx', 'xxxxxxxxxxx']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.