![](/img/trans.png)
[英]Scraping Headlines From News Website Homepages Using BeautifulSoup in Python
[英]scraping news website aggregator by clicking on more news button using selenium
我想從這個鏈接中抓取新聞頭條: https://www.newsnow.co.uk/h/Business+&+Finance?type=ln
我想通過單擊(使用 selenium) “查看更多標題”按鈕來擴展新聞,以收集盡可能多的新聞標題
我創建了這段代碼,但未能點擊展開新聞:
import time
from selenium import webdriver
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")
driver.get(u)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
driver.implicitly_wait(60) # seconds
elem = driver.find_element_by_css_selector('span:contains("view more headlines")')
for i in range(10):
elem.click()
time.sleep(5)
print(f'click {i} done')
返回: selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
我嘗試使用 xpath 選擇器:
elem = driver.find_element_by_xpath('//[@id="nn_container"]/div[2]/main/div[2]/div/div/div[3]/div/a')
返回: selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class="rs-button-more js-button-more btn--primary btn--primary--no-spacing" href="#">...</a> is not clickable at point (353, 551). Other element would receive the click: <div class="alerts-scroller">...</div>
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <a class="rs-button-more js-button-more btn--primary btn--primary--no-spacing" href="#">...</a> is not clickable at point (353, 551). Other element would receive the click: <div class="alerts-scroller">...</div>
點擊按鈕在點擊后被覆蓋元素覆蓋。 因此,我們在第一次點擊后使用javascript
來獲取它。 這是工作程序。
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
u = 'https://www.newsnow.co.uk/h/Business+&+Finance?type=ln'
driver = webdriver.Chrome(executable_path=r"C:\bin\chromedriver.exe")
driver.maximize_window()
driver.get(u)
time.sleep(10)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
for i in range(10):
element =WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME,'btn--primary__label')))
driver.execute_script("arguments[0].scrollIntoView();", element)
element.click()
time.sleep(5)
print(f'click {i} done')
這個是正確的XPath:
driver.find_element_by_xpath(r'//*[@id="nn_container"]/div[2]/main/div[2]/div/div/div[3]/div/a').click()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.