简体   繁体   中英

scraping news website aggregator by clicking on more news button using selenium

I want to scrape news headlines from this link: https://www.newsnow.co.uk/h/Business+&+Finance?type=ln

I want to expand news by clicking (using selenium) on the button 'view more headlines' to collect the max number of news headlines possible

I created this code but failed to make the click to expand news:

import time
from selenium import webdriver
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'

driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")    
driver.implicitly_wait(60) # seconds

elem = driver.find_element_by_css_selector('span:contains("view more headlines")')
for i in range(10):
    elem.click()
    time.sleep(5)
    print(f'click {i} done')

returns: selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified

I tried using xpath selector:

elem = driver.find_element_by_xpath('//[@id="nn_container"]/div[2]/main/div[2]/div/div/div[3]/div/a')

returns: selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class="rs-button-more js-button-more btn--primary btn--primary--no-spacing" href="#">...</a> is not clickable at point (353, 551). Other element would receive the click: <div class="alerts-scroller">...</div> selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class="rs-button-more js-button-more btn--primary btn--primary--no-spacing" href="#">...</a> is not clickable at point (353, 551). Other element would receive the click: <div class="alerts-scroller">...</div>

The click button gets covered by an overlay element after the click. So, we use javascript to get to it after the first click. Here is the working program.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'

driver = webdriver.Chrome(executable_path=r"C:\bin\chromedriver.exe")
driver.maximize_window()
driver.get(u)
time.sleep(10)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
for i in range(10):
        element =WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME,'btn--primary__label')))
        driver.execute_script("arguments[0].scrollIntoView();", element)
        element.click()
        time.sleep(5)

        print(f'click {i} done')

This one is the correct XPath:

driver.find_element_by_xpath(r'//*[@id="nn_container"]/div[2]/main/div[2]/div/div/div[3]/div/a').click()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM