I have an HTML tree and only want a certain part of it. Ie I only want the part of an HTML tree which is above a certain tag with string. The example contains only one b
tag with Notes
as string but there could be several.
<br/>
Hello
<br/>
<b>
Notes
</b>
<br/>
Hello
<a name="test">
Hello2
</a>
should become
<br/>
Hello
<br/>
With my code I only get the desired output as list but not as HTML tree.
#book.html contains the example from above
openHtml = open('book.html', 'r')
soup = BeautifulSoup(openHtml, 'html.parser')
all=soup.find_all('b')
for i in all:
if i.text.strip() == 'Notes':
pos = all.index(i)
soup = soup.find_all("b")[pos].find_all_previous(string=True)
print(soup)
How can I get the same result as HTML and not as list?
Solution
I iterated over the list and removed every element after the desired tag and removed the tag itself from the end.
openHtml = open('book.html', 'r')
soup = BeautifulSoup(openHtml, 'html.parser')
all=soup.find_all('b')
for i in all:
if i.text.strip() == 'Notes':
pos = all.index(i)
for i in soup.find_all("b")[pos]:
for j in i.find_all_next():
j.extract()
soup.find_all('b')[-1].extract()
print(soup.prettify())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.