简体   繁体   中英

Remove selected tag in an element with BeautifulSoup

In a page, we have several h1's. In the first h1, I want to remove the tag with class read-time . Here is my attempt at it. However, the tag is not being deleted. Where am I going wrong?

h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.select('.read-time').clear()

print("AFTER: main.select('h1')", main.select('h1'))

log

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]

Use decompose() to delete.

html='''<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]'''
main=BeautifulSoup(html,'html.parser')
h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.decompose()

print("AFTER: main.select('h1')", main.select('h1'))

Output:

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1 id="before-you-begin">Before You Begin</h1>]

.select() returns a list. Iterate through the list and decompose as KunduK suggested:

h1s = main.select('h1')
print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

read_times = real_h1.select(".read-time")
for span in read_times:
    span.decompose()

print("AFTER: main.select('h1')", main.select('h1'))
BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1>Introduction</h1>, <h1 id="before-you-begin">Before You Begin</h1>]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM