How to replace all occurrence of particular nested tag

Question

I want to get the html file tags data without nested tags (prefer : BeautifulSoup base solution) but regex will also work eg:

`<li><p>HELLO1</p></li >  <li>HELLO2</li><p>HELLO3</p>`

answer

HELLO1 HELLO2 HELLO3

I tried to use regex but didn't find how to use for soup object str(soup).replace("< li > < p >","< p >")

tags = soup.find_all(['p','li'])
it returns:
< p >HELLO1< /p >,
HELLO1 ,
HELLO2 ,
HELLO3

if there is li and p tags are nested result should show only one occurrence or one nested tag should be removed. eg:if < li >< p >XYZ< /p >< /li > it should becomes < li >XYZ< /li >

Answer 1

You could use .get_text() method:

data = '''<li><p>HELLO1</p></li >  <li>HELLO2</li><p>HELLO3</p>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.get_text(separator=' ', strip=True))

Prints:

HELLO1 HELLO2 HELLO3

How to replace all occurrence of particular nested tag

Question

1 answers

solution1
1 2019-07-17 04:37:31

How to replace all occurrence of particular nested tag

Question

1 answers

solution1 1 2019-07-17 04:37:31

solution1
1 2019-07-17 04:37:31