简体   繁体   English

使用beautifulsoup4解析xml文件

[英]Parsing xml file using beautifulsoup4

I want to extract the stuff tag from profile name="4" only. 我只想从配置文件名称=“ 4”中提取填充标签。 I have written a code below that extracts everything under profile name = "4" but is there a way from there to collect all the stuff tags or would I have to use split to get the text inside the stuff tag. 我在下面编写了一个代码,该代码提取了概要文件名称=“ 4”下的所有内容,但是有没有一种方法可以收集所有的东西标签,或者我必须使用split来将东西放入东西标签中。 The xml file that I have is much longer so using split is doable but it would take much longer to parse the data. 我拥有的xml文件更长,因此使用split是可行的,但解析数据将花费更长的时间。

This is the python code 这是python代码

import bs4 as bs

# opens xml file and allows bs4 to parse xml file
xml_file = open('file.xml')
soup = bs.BeautifulSoup(xml_file, 'html.parser')

#extracts and prints all tags under profile name = "4"
stuff = soup.find_all('profile', {'name':"4"})
print stuff

This is the xml file and its called file.xml. 这是xml文件,其名为file.xml。 I want to extract the stuff tags from profile name = "4" 我想从配置文件名称=“ 4”中提取填充标签

<profiles>
    <profile name="1">
        <content>apple</content>
    </profile>
    <profile name="2">
        <content>peas</content>
    </profile>
    <profile name="3">
        <stuff>bear</stuff>
    </profile>
    <profile name="4">
        <content>cat</content>
        <data>
            <stuff>fish</stuff>
        </data>
        <stuff>hat</stuff>
    </profile>
</profiles>

Do the the same for inner tags 对内部标签执行相同的操作

print([i.find_all('stuff') for i in stuff])

If you just need data inside the tags 如果您只需要标签内的数据

for i in stuff:
    for x in i.find_all('stuff'):
        print(x.next) 

Output: 输出:

fish
hat

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM