使用BeautifulSoup或minidom解析XML

Question

I have XML something like this 我有这样的XML

#filename sample.xml
<tag>
<tag1>
<tag2 property="something"/>
<tag2 property="something1"/>
<tag2 property="something2">value</tag2>
<tag2 property="something3">
<tag3>
<tag4 data="data1"/>
<tag4 data="data2"/>
</tag3>
</tag2>
</tag1>
</tag>

I want to extract 'data1' and 'data2' . 我想提取'data1'和'data2' 。 I'm trying something like this: 我正在尝试这样的事情：

f=open('sample.xml')
fdata=f.read()
xmldata=BeautifulSoup(fadata)
print (xmldata.tag.tag1.tag2.tag3.tag4["data"])

But it's throwing an error: 但这会引发错误：

AttributeError: 'NoneType' object has no attribute 'tag4'

Answer 1

The print function is failing due to the multiple tag2 s. 由于多个tag2 ， print功能失败。 A solution would be to retrieve all the tags by using .findAll('tag2') . 一种解决方案是使用.findAll('tag2')检索所有标签。

Here is a working example: 这是一个工作示例：

#! /usr/bin/python

from bs4 import BeautifulSoup
f=open('sample.xml')
fdata=f.read()
xmldata=BeautifulSoup(fdata)

alltags2 = xmldata.tag.tag1.findAll('tag2')

for tag2 in alltags2:
    alltags3 = tag2.findAll('tag3')
    for tag3 in alltags3:
        alltags4 = tag3.findAll('tag4')
        for tag4 in alltags4:
            print "The data I got was :\"%s\"" % (tag4["data"])

Kind Regards, 亲切的问候，

Answer 2

One possible way is using select() method passing CSS selector statement as parameter. 一种可能的方法是使用将CSS选择器语句作为参数传递的select()方法。 For example, if you really want to strictly select <tag4> having such ancestor hierarchy : 例如，如果您确实要严格选择具有此类祖先层次结构的<tag4> ：

.....
xmldata=BeautifulSoup(fadata)
for tag4 in xmldata.select("tag > tag1 > tag2 > tag3 > tag4"):
    print tag4["data"]

Above will print the following : 上面将打印以下内容：

data1
data2

Or if you only need all <tag4> elements wherever they are located in the XML, you can simply use xmldata.select("tag4") . 或者，如果您只需要XML中所有<tag4>元素，则只需使用xmldata.select("tag4") 。

使用BeautifulSoup或minidom解析XML

问题描述

2 个解决方案

解决方案1
3 2015-03-21 20:56:49

解决方案2
2 2015-03-22 09:53:21

使用BeautifulSoup或minidom解析XML

问题描述

2 个解决方案

解决方案1 3 2015-03-21 20:56:49

解决方案2 2 2015-03-22 09:53:21

解决方案1
3 2015-03-21 20:56:49

解决方案2
2 2015-03-22 09:53:21