简体   繁体   English

使用minidom从URL解析XML与Python

[英]Parsing XML with Python from URL using minidom

I have a problem during parsing the following xml from a url. 我从网址解析以下xml时遇到问题。

sample XML in my url path: 我的网址路径中的示例XML:

<?xml version="1.0" encoding="utf-8"?> 
<Documents>
    <class>
        <mid name="yyyyyyyyyyyyy"></mid>
        <person name="yyyyyyyyyy"></person>
        <url name="yyyyyyyyy"></url>
    </class>
    <class>
        <mid name="xxxxx"></mid>
        <person name="xxxxxxxxxx"></person>
        <url name="xxxxxxxxxxx"></url>
    </class>
</Documents>

Below is my python code; 下面是我的python代码;

def staff_list(request):

    url = http://path.to.url/
    dom = minidom.parse(urlopen(url))
    person = dom.getElementsByTagName('person')
    for i in person:
        print i.attributes['name'].value

within forloop I want to print the person and url tag values in xml that belongs to same parent class. 在forloop中,我想在属于同一父类的xml中打印person和url标记值。

I tried the following method with iteration but get the "too many values to unpack" ERROR 我尝试了以下方法进行迭代,但得到“太多值无法解包”错误

def staff_list(request):

    url = http://path.to.url/
    dom = minidom.parse(urlopen(url))
    person = dom.getElementsByTagName('person')
    mid = dom.getElementsByTagName('mid')
    url = dom.getElementsByTagName('url')
    for i,j,k in person,mid,url:
        print i.attributes['name'].value,j.attributes['name'].value,k.attributes['name'].value

Any suggestions ? 有什么建议么 ?

You want to use zip() to combine the elements, I think: 我想使用zip()组合元素,我认为:

for i,j,k in zip(person, mid, url):

Do yourself a big favour though and use the ElementTree API instead; 不过,请帮自己一个大忙,改用ElementTree API that API is far pythononic and easier to use than the XML DOM API. 与XML DOM API相比,该API具有很强的Python风格并且易于使用。

If you want to stick with minidom you can change your loop to: 如果您想坚持minidom ,可以将循环更改为:

for cls in dom.getElementsByTagName('class'):
    person = cls.getElementsByTagName('person')[0]
    mid = cls.getElementsByTagName('mid')[0]
    url = cls.getElementsByTagName('url')[0]

    print person.attributes['name'].value
    print mid.attributes['name'].value
    print url.attributes['name'].value

As @Martijn Pieters said, have a look at ElementTree as an alternative API. 正如@Martijn Pieters所说的,看看ElementTree作为替代API。 For example: 例如:

import xml.etree.ElementTree as ET
documents = ET.fromstring(xmlstr)
for cls in documents.iter('class'):
    person = cls.find('person')
    mid = cls.find('mid')
    url = cls.find('url')

    print person.get('name'), mid.get('name'), url.get('name')

I would use xpath and lxml.html: A minimalist approach: 我将使用xpath和lxml.html:极简方法:

import lxml.html as lh
doc=lh.parse(test.xml)

In [70]: persons = doc.xpath('.//person/@name')

In [71]: urls=doc.xpath('.//person[@name]/following-sibling::url/@name')

In [72]: mids=doc.xpath('.//person[@name]/preceding-sibling::mid/@name')

In [73]: [[p,m,u]for p,m,u in zip(persons, mids, urls)]
Out[73]: 
[['yyyyyyyyyy', 'yyyyyyyyyyyyy', 'yyyyyyyyy'],
 ['xxxxxxxxxx', 'xxxxx', 'xxxxxxxxxxx']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM