[英]Remove Items from webpage by comparing it with a List (Python)
I have gathered the data in a list which needs to be removed , the below code shows the list :
我已经收集了需要删除的列表中的数据,下面的代码显示了列表:
keyword= "www.indigo.com"
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
It prints the following output :
它打印以下输出:
['https://www.indigo.com/registration.html']
[]
['https://www.indigo.com/buservfcl.html', 'https://www.indigo.com/2021/07/agents.html']
getDetails
has the complete page source codegetDetails
有完整的页面源码
Now, how do I compare
getDetails
with thehrefs
list and remove/decompose every items that is present in the list.现在,我如何将
getDetails
与hrefs
列表进行比较并删除/分解列表中存在的每个项目。
I tried this , but it doesnt work for some reason :
我试过这个,但由于某种原因它不起作用:
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
for z in hrefs:
getDetails.decompose()
It removed the entire data in the getDescription, but i need to remove only the elements which are in the list and not evrything
它删除了 getDescription 中的整个数据,但我只需要删除列表中的元素而不是所有内容
The output should be the complete HTML except the ones that has www.indigo.com in it
输出应该是完整的 HTML,除了包含www.indigo.com的那些
You have to find parent
tag and then use decompose()
method你必须找到
parent
标签,然后使用decompose()
方法
html="""<div><a href="www.indigo.com"></div>"""
soup=BeautifulSoup(html,"html.parser")
target= "www.indigo.com"
href_tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in href_tags:
i.parent.decompose()
Output:输出:
soup
will be empty soup
会是空的
From the URL:从网址:
import requests
res=requests.get("https://www.assamcareer.com/2021/06/oil-india-limited.html")
soup=BeautifulSoup(res.text,"html.parser")
target= "www.assamcareer.com"
tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in tags:
i.parent.decompose()
Updated Answer:更新答案:
for title in root:
/
Your code
/
href_tags = [links for links in getDetails.find_all('a',href=True) if target in links['href']]
print(href_tags)
for i in href_tags:
i.parent.decompose()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.