简体   繁体   English

无法在 Beautiful Soup 中找到 html 元素

[英]Unable to locate html element in Beautiful Soup

Hello experts , I am working on a very challenging task.您好专家,我正在从事一项非常具有挑战性的任务。 This is the HTML I have :这是我拥有的 HTML:

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>

<p><span style="font-family:arial; font-size:small"><span style="font-size:medium"><strong>Facebook Details Data: </strong></span>Data is always gathered valid <a href="https://www.facebook.com/users/09" ><strong>Facebook Web Scraping</strong></a></span></p>

<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the output i am trying to Achieve ( remove the complete element which has facebook.com , the third line of the html should be removed, since it has facebook.com in it )这是我试图实现的输出(删除包含 facebook.com 的完整元素,应该删除 html 的第三行,因为它包含facebook.com

<strong>SC/ST:</strong> Minimum 18 Years and Maximum 35 Years<br />
<strong>OBC (Non-Creamy Layer):</strong> Minimum 18 Years and Maximum 33 Years&nbsp;</span></p>
    
<p><span style="font-family:arial; font-size:small">&nbsp;</span></p>

<p><span style="font-size:large"><span style="font-family:arial"><span style="font-size:small"><strong><span style="font-size:medium">Districts:</span> </strong>Candidates only from the following districts of Assam can apply for these posts: &nbsp;<br />

This is the code I have tried :这是我试过的代码:

getDetails = soup2.find('div', class_='post-body entry-content')
toRemove = "www.facebook.com"
try:
    for headless in (getDetails for getDetails in getDetails.find_all('a') if any( getDetails.find(toRemove))):
        headless.decompose()
except:
    print("facebook not found")

But, this code isnt working, the Output always has facebook.com in it.但是,此代码不起作用,输出中始终包含 facebook.com。 I have tried everything, but nothing works for me.我已经尝试了一切,但对我没有任何作用。 Its quite a bit of challenge though.虽然它相当具有挑战性。 Please help me achieve the goal.请帮助我实现目标。 Thanks谢谢

Try to use .parents which return list of parent tag choose appropriate tag from list and you can pass it to decompose() method尝试使用返回父标签列表的.parents从列表中选择适当的标签,您可以将其传递给decompose()方法

if "facebook.com"  in soup.find("a")['href']:
    main_parent_tag=list(soup.find("a").parents)[1]
    main_parent_tag.decompose()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM