简体   繁体   中英

How to replace all occurrence of particular nested tag

I want to get the html file tags data without nested tags (prefer : BeautifulSoup base solution) but regex will also work eg:

`<li><p>HELLO1</p></li >  <li>HELLO2</li><p>HELLO3</p>`

answer

HELLO1 HELLO2 HELLO3

I tried to use regex but didn't find how to use for soup object str(soup).replace("< li > < p >","< p >")

tags = soup.find_all(['p','li'])
it returns:
< p >HELLO1< /p >,
HELLO1 ,
HELLO2 ,
HELLO3

if there is li and p tags are nested result should show only one occurrence or one nested tag should be removed. eg:if < li >< p >XYZ< /p >< /li > it should becomes < li >XYZ< /li >

You could use .get_text() method:

data = '''<li><p>HELLO1</p></li >  <li>HELLO2</li><p>HELLO3</p>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.get_text(separator=' ', strip=True))

Prints:

HELLO1 HELLO2 HELLO3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM