简体   繁体   中英

Remove Items from webpage by comparing it with a List (Python)

I have gathered the data in a list which needs to be removed , the below code shows the list :

keyword= "www.indigo.com"
hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)

It prints the following output :

['https://www.indigo.com/registration.html']
[]
['https://www.indigo.com/buservfcl.html', 'https://www.indigo.com/2021/07/agents.html']

getDetails has the complete page source code

Now, how do I compare getDetails with the hrefs list and remove/decompose every items that is present in the list.

I tried this , but it doesnt work for some reason :

hrefs = [links['href'] for links in getDetails.find_all('a', href=True) if target in links['href']]
print(hrefs)
for z in hrefs:
    getDetails.decompose()

It removed the entire data in the getDescription, but i need to remove only the elements which are in the list and not evrything

The output should be the complete HTML except the ones that has www.indigo.com in it

You have to find parent tag and then use decompose() method

html="""<div><a href="www.indigo.com"></div>"""

soup=BeautifulSoup(html,"html.parser")

target= "www.indigo.com"
href_tags = [links for links in soup.find_all('a', href=True) if target in links['href']]

for i in href_tags:
    i.parent.decompose()

Output:

soup will be empty

From the URL:

import requests
res=requests.get("https://www.assamcareer.com/2021/06/oil-india-limited.html")
soup=BeautifulSoup(res.text,"html.parser")
target= "www.assamcareer.com"
tags = [links for links in soup.find_all('a', href=True) if target in links['href']]
for i in tags:
    i.parent.decompose()

Updated Answer:

for title in root:
    /
 
        Your code

    /
    href_tags = [links for links in getDetails.find_all('a',href=True) if target in links['href']]
    print(href_tags)

for i in href_tags:
    i.parent.decompose()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM