简体   繁体   English

使用BeautifulSoup删除元素中的选定标签

[英]Remove selected tag in an element with BeautifulSoup

In a page, we have several h1's. 在一个页面中,我们有几个h1。 In the first h1, I want to remove the tag with class read-time . 在第一个h1中,我想使用class read-time删除标签。 Here is my attempt at it. 这是我的尝试。 However, the tag is not being deleted. 但是,标签不会被删除。 Where am I going wrong? 我要去哪里错了?

h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.select('.read-time').clear()

print("AFTER: main.select('h1')", main.select('h1'))

log 日志

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]

Use decompose() to delete. 使用decompose()删除。

html='''<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]'''
main=BeautifulSoup(html,'html.parser')
h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.decompose()

print("AFTER: main.select('h1')", main.select('h1'))

Output: 输出:

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1 id="before-you-begin">Before You Begin</h1>]

.select() returns a list. .select()返回一个列表。 Iterate through the list and decompose as KunduK suggested: 遍历列表并按照KunduK的建议进行decompose

h1s = main.select('h1')
print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

read_times = real_h1.select(".read-time")
for span in read_times:
    span.decompose()

print("AFTER: main.select('h1')", main.select('h1'))
BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1>Introduction</h1>, <h1 id="before-you-begin">Before You Begin</h1>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM