无法在 Beautiful Soup 中找到 html 元素

Question

Hello experts , I am working on a very challenging task.您好专家，我正在从事一项非常具有挑战性的任务。 This is the HTML I have :这是我拥有的 HTML：

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>

<p><span style="font-family:arial; font-size:small"><span style="font-size:medium"><strong>Facebook Details Data: </strong></span>Data is always gathered valid <a href="https://www.facebook.com/users/09" ><strong>Facebook Web Scraping</strong></a></span></p>

<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the output i am trying to Achieve ( remove the complete element which has facebook.com , the third line of the html should be removed, since it has facebook.com in it )这是我试图实现的输出（删除包含 facebook.com 的完整元素，应该删除 html 的第三行，因为它包含facebook.com ）

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>
    
<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the code I have tried :这是我试过的代码：

getDetails = soup2.find('div', class_='post-body entry-content')
toRemove = "www.facebook.com"
try:
    for headless in (getDetails for getDetails in getDetails.find_all('a') if any( getDetails.find(toRemove))):
        headless.decompose()
except:
    print("facebook not found")

But, this code isnt working, the Output always has facebook.com in it.但是，此代码不起作用，输出中始终包含 facebook.com。 I have tried everything, but nothing works for me.我已经尝试了一切，但对我没有任何作用。 Its quite a bit of challenge though.虽然它相当具有挑战性。 Please help me achieve the goal.请帮助我实现目标。 Thanks谢谢

Answer 1

Try to use .parents which return list of parent tag choose appropriate tag from list and you can pass it to decompose() method尝试使用返回父标签列表的.parents从列表中选择适当的标签，您可以将其传递给decompose()方法

if "facebook.com"  in soup.find("a")['href']:
    main_parent_tag=list(soup.find("a").parents)[1]
    main_parent_tag.decompose()

无法在 Beautiful Soup 中找到 html 元素

问题描述

1 个解决方案

解决方案1
1 2021-11-11 07:02:47

无法在 Beautiful Soup 中找到 html 元素

问题描述

1 个解决方案

解决方案1 1 2021-11-11 07:02:47

解决方案1
1 2021-11-11 07:02:47