简体   繁体   中英

How do I replace the text of a nested tag in BeautifulSoup4?

I'm working on a program that formats HTML for a comment for reddit. So for instance if I have:

<i> This is Italics <b> but this is bold and italics</b>  and back to italics <i>

I'd want to format it as

*This is Italics* ***but this is bold and italics*** *and back to italics*

So it would produce: This is Italics but this is bold and back to italics

I'm having trouble finding all tags inside tags and replacing them with the right amount of asterisks without messing up the formatting. I've tried several things but the most recent is:

italics = soup.find_all('i')
for i in range(len(italics)):
    bold = italics[i].find_all('b')
    for j in bold:
        bold[i].replace_with('***' + bold[i].text + '***')

But I get errors when trying to edit nested tags, I don't want to edit every bold tag with ***, just the ones inside the italics so it keeps formatting, the rest I can change to **.

Perhaps something easier (to understand) like this.

italics = soup.find_all('i')
for i in italics:
    print(i.b)
    if i.b:
        i.b.replace_with('***' +i.b.text +'***')

print(soup)

And the whole code, stupid but it works

italics = soup.find_all('i')
for i in italics:
    print(i.b)
    if i.b:
        i.b.replace_with('* ***'+ i.b.text +'*** *')
    i.replace_with('*'+i.text+ '*')

print(soup)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM