简体   繁体   中英

How to stop the selenium webdriver after reaching the last page while scraping the website?

The amount of data(number of pages) on the site keeps changing and I need to scrape all the pages looping through the pagination. Website: https://monentreprise.bj/page/annonces

Code I tried:

xpath= "//*[@id='yw3']/li[12]/a"        
while True:
    next_page = driver.find_elements(By.XPATH,xpath)
    if len(next_page) < 1:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath))).click()
        print('ok')

ok is printed continuously

Because the condition if len(next_page)<1 is always False.

For instance I tried the url monentreprise.bj/page/annonces?Company_page=99999999999999999999999 and it gives the page 13 which is the last page

What you could try maybe is checking if the "next page" button is disabled在此处输入图像描述

There are several issues here:

  1. //*[@id='yw3']/li[12]/a is not a correct locator for the next pagination button.
  2. The better indication for the last page reached state here will be to validate if this css_locator based element .pagination.next contains disabled class.
  3. You have to scroll the page down before clicking the next page button
  4. You have to add a delay after clicking on the pagination button. Otherwise this will not work.
    This code worked for me:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
my_url = "https://monentreprise.bj/page/annonces"
driver.get(my_url)
next_page_parent = '.pagination .next'
next_page_parent_arrow = '.pagination .next a'
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(0.5)
    parent = driver.find_element(By.CSS_SELECTOR,next_page_parent)
    class_name = parent.get_attribute("class")
    if "disabled" in class_name:
        print("No more pages")
        break
    else:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_parent_arrow))).click()
        time.sleep(1.5)
        print('ok')

The output is:

ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
No more pages

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM