简体   繁体   中英

Problem with XML-parsing using Beautiful Soup

When trying to replace some elements in an XML with Beautiful Soup, I found out that I have to use soup.find_all().string.replace_with() to replace the desired elements. However, I came across the problem that the soup.find_all() method only returns elements of type None .

So I tried to break my problem down to an XML that is as basic as possible:

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    print('Element {} has type {}.'.format(elem, elem.type))

Which gives the exact same thing:

Element <test tag="0"/> has type None.
Element <test tag="1"/> has type None.

I'd be happy, if someone could point out, where the problem lies.

Thanks in advance

Well I'm not sure exactly what you're looking for as an output, but you can replace tag attributes the following way:

from bs4 import BeautifulSoup as BS

xml = """
<xml>
    <test tag="0"/>
    <test tag="1"/>
</xml>"""

replace_list = ['0']
replacement = '2'

soup = BS(xml, 'xml')
for elem in soup.find_all("test"):
    if elem['tag'] in replace_list:
        elem['tag'] = replacement
    #print('Element {} has type {}.'.format(elem, elem.name))

xml = str(soup)

print (xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<xml>
<test tag="2"/>
<test tag="1"/>
</xml>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM