简体   繁体   English

使用Beautiful Soup进行XML解析的问题

[英]Problem with XML-parsing using Beautiful Soup

When trying to replace some elements in an XML with Beautiful Soup, I found out that I have to use soup.find_all().string.replace_with() to replace the desired elements. 当尝试用Beautiful Soup替换XML中的某些元素时,我发现必须使用soup.find_all().string.replace_with()替换所需的元素。 However, I came across the problem that the soup.find_all() method only returns elements of type None . 但是,我遇到了一个问题,即soup.find_all()方法仅返回类型为None元素。

So I tried to break my problem down to an XML that is as basic as possible: 因此,我尝试将问题分解为尽可能基本的XML:

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    print('Element {} has type {}.'.format(elem, elem.type))

Which gives the exact same thing: 这给出了完全相同的东西:

Element <test tag="0"/> has type None.
Element <test tag="1"/> has type None.

I'd be happy, if someone could point out, where the problem lies. 如果有人指出问题出在哪里,我会很高兴。

Thanks in advance 提前致谢

Well I'm not sure exactly what you're looking for as an output, but you can replace tag attributes the following way: 好吧,我不确定您要查找的输出内容是什么,但是您可以通过以下方式替换标记属性:

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

replace_list = ['0']
replacement = '2'

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    if elem['tag'] in replace_list:
        elem['tag'] = replacement
    #print('Element {} has type {}.'.format(elem, elem.name))

xml = str(soup)

print (xml)

Output: 输出:

<?xml version="1.0" encoding="utf-8"?>
<xml>
<test tag="2"/>
<test tag="1"/>
</xml>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM