The amount of data(number of pages) on the site keeps changing and I need to scrape all the pages looping through the pagination. Website: https://monentreprise.bj/page/annonces
Code I tried:
xpath= "//*[@id='yw3']/li[12]/a"
while True:
next_page = driver.find_elements(By.XPATH,xpath)
if len(next_page) < 1:
print("No more pages")
break
else:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath))).click()
print('ok')
ok
is printed continuously
Because the condition if len(next_page)<1
is always False.
For instance I tried the url monentreprise.bj/page/annonces?Company_page=99999999999999999999999 and it gives the page 13 which is the last page
What you could try maybe is checking if the "next page" button is disabled
There are several issues here:
//*[@id='yw3']/li[12]/a
is not a correct locator for the next
pagination button. .pagination.next
contains disabled
class.from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
my_url = "https://monentreprise.bj/page/annonces"
driver.get(my_url)
next_page_parent = '.pagination .next'
next_page_parent_arrow = '.pagination .next a'
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(0.5)
parent = driver.find_element(By.CSS_SELECTOR,next_page_parent)
class_name = parent.get_attribute("class")
if "disabled" in class_name:
print("No more pages")
break
else:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_parent_arrow))).click()
time.sleep(1.5)
print('ok')
The output is:
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
No more pages
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.