简体   繁体   中英

Selenium clicking next button programmatically until the last page

hi I'm new to web scraping and have been trying to use Selenium to scrape a forum in python

I am trying to get Selenium to click "Next" until the last page but I am not sure how to break the loop. and I having trouble with the locator:

When I locate the next button by partial link , the automated clicking will continue to next thread eg page1->page2->next thread->page1 of next thread-->page2 of next thread

while True:
    next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Next")))
    next_link.click()

When I locate the next button by class name , the automated clicking will click "prev" button when it reaches the last page

while True:
    next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext")))
    next_link.click()

My questions are:

  1. Which locator should I use? (by class or by partial link or any other suggestion?
  2. How do I break the loop so it stops clicking when it reaches the last page?
  1. You can use any locator which gives unique identification. Best practices says the following order.

    • Id
    • Name
    • Class Name
    • Css Selector
    • Xpath
    • Others
  2. The come out of the while loop when it is not find the element you can use try block as given below. the break command is used for the same.

     while True: try: next_link = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "prevnext"))) next_link.click() except TimeoutException: break 

There are a couple of things you need to consider as follows :

  • There are two elements on the page with text as Next one on Top and another at the Bottom , so you need to decide with which element you desire to interact and construct a unique Locator Strategy
  • Moving forward as you want to invoke click() on the element instead of expected-conditions as presence_of_element_located() you need to use element_to_be_clickable() .
  • When there would be no element with text as Next you need to execute the remaining steps, so invoke the click() within try-catch block and incase of an exception break out.
  • As per your requirement xpath as a Locator Strategy looks good to me.
  • Here is the working code block :

     from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_argument('disable-infobars') driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\\Utility\\BrowserDrivers\\chromedriver.exe') driver.get("https://forums.hardwarezone.com.sg/money-mind-210/hdb-fully-paid-up-5744914.html") driver.find_element_by_xpath("//a[@id='poststop' and @name='poststop']//following::table[1]//li[@class='prevnext']/a").click() while True: try : WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='poststop' and @name='poststop']//following::table[1]//li[@class='prevnext']/a[contains(.,'Next')]"))).click() except : print("No more pages left") break driver.quit() 
  • Console Output :

     No more pages left 

You can use below code to click Next button until the last page reached and break the loop if the button is not present:

from selenium.common.exceptions import TimeoutException

while True:
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ›"))).click()
    except TimeoutException:
        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM