简体   繁体   中英

Delete dynamic elements from HTML with Selenium and Python

I've used BeautifulSoup to find a specific div class in the page's HTML. I want to check if this div has a span class inside it. If the div has the span class, I want to maintain it on the page's code, but if it doesn't, I want to delete it, maybe using Selenium.

For that I have two lists selecting the elements (div and span). I tried to check if one list is inside the other, and that kind of worked. But how can one delete that found element from the page's source code?

Edit

I've edited the code after a few conversations in the commentaries section. With help, I was able to implement code to remove elements executing javascript.

The code is running with no errors, but nothing is being deleted from the page.

# Import required module
from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import time

# Option to launch browser in incognito
options = Options()
options.add_argument("--incognito")
#options.add_argument("--headless")

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")

Since you're deleting them in javascript anyway:

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")

Note: Question and comments reads a bit confusing so it would be great to improve it a bit. Assuming you like to decompose() some elements, the reason why or what to do after this action is not clear. So this answer will only point out an apporache.

To decompose() the elements that do not contains ads use this creative and text just negate your selection and iterate the ResultSet :

for e in soup.select('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))'):
    e.decompose()

Now these elements will no longer be included in your soup and you could process it for your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM