使用Beautiful Soup进行XML解析的问题

Question

When trying to replace some elements in an XML with Beautiful Soup, I found out that I have to use soup.find_all().string.replace_with() to replace the desired elements. 当尝试用Beautiful Soup替换XML中的某些元素时，我发现必须使用soup.find_all().string.replace_with()替换所需的元素。 However, I came across the problem that the soup.find_all() method only returns elements of type None . 但是，我遇到了一个问题，即soup.find_all()方法仅返回类型为None元素。

So I tried to break my problem down to an XML that is as basic as possible: 因此，我尝试将问题分解为尽可能基本的XML：

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    print('Element {} has type {}.'.format(elem, elem.type))

Which gives the exact same thing: 这给出了完全相同的东西：

Element <test tag="0"/> has type None.
Element <test tag="1"/> has type None.

I'd be happy, if someone could point out, where the problem lies. 如果有人指出问题出在哪里，我会很高兴。

Thanks in advance 提前致谢

Answer 1

Well I'm not sure exactly what you're looking for as an output, but you can replace tag attributes the following way: 好吧，我不确定您要查找的输出内容是什么，但是您可以通过以下方式替换标记属性：

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

replace_list = ['0']
replacement = '2'

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    if elem['tag'] in replace_list:
        elem['tag'] = replacement
    #print('Element {} has type {}.'.format(elem, elem.name))

xml = str(soup)

print (xml)

Output: 输出：

<?xml version="1.0" encoding="utf-8"?>
<xml>
<test tag="2"/>
<test tag="1"/>
</xml>

使用Beautiful Soup进行XML解析的问题

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-02-28 14:33:28

使用Beautiful Soup进行XML解析的问题

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-02-28 14:33:28

解决方案1
0 已采纳 2019-02-28 14:33:28