简体   繁体   English

通过将其与列表(Python)进行比较来从网页中删除项目

[英]Remove Items from webpage by comparing it with a List (Python)

I have gathered the data in a list which needs to be removed , the below code shows the list :我已经收集了需要删除的列表中的数据,下面的代码显示了列表:

keyword= "www.indigo.com"
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)

It prints the following output :它打印以下输出:

['https://www.indigo.com/registration.html']
[]
['https://www.indigo.com/buservfcl.html', 'https://www.indigo.com/2021/07/agents.html']

getDetails has the complete page source code getDetails有完整的页面源码

Now, how do I compare getDetails with the hrefs list and remove/decompose every items that is present in the list.现在,我如何将getDetailshrefs列表进行比较并删除/分解列表中存在的每个项目。

I tried this , but it doesnt work for some reason :我试过这个,但由于某种原因它不起作用:

hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
for z in hrefs:
    getDetails.decompose()

It removed the entire data in the getDescription, but i need to remove only the elements which are in the list and not evrything它删除了 getDescription 中的整个数据,但我只需要删除列表中的元素而不是所有内容

The output should be the complete HTML except the ones that has www.indigo.com in it输出应该是完整的 HTML,除了包含www.indigo.com的那些

You have to find parent tag and then use decompose() method你必须找到parent标签,然后使用decompose()方法

html="""<div><a href="www.indigo.com"></div>"""

soup=BeautifulSoup(html,"html.parser")

target= "www.indigo.com"
href_tags = [links for links in soup.find_all('a', href=True) if target in links['href']]

for i in href_tags:
    i.parent.decompose()

Output:输出:

soup will be empty soup会是空的

From the URL:从网址:

import requests
res=requests.get("https://www.assamcareer.com/2021/06/oil-india-limited.html")
soup=BeautifulSoup(res.text,"html.parser")
target= "www.assamcareer.com"
tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in tags:
    i.parent.decompose()

Updated Answer:更新答案:

for title in root:
    /
 
        Your code

    /
    href_tags = [links for links in getDetails.find_all('a',href=True) if target in links['href']]
    print(href_tags)

for i in href_tags:
    i.parent.decompose()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM