简体   繁体   中英

Webdriver Selenium Not Navigating to the Next Page

I am writing a python script using Chromedriver and Selenium to scrape data from a website. The problem is that after scraping data from one page, when the program navigates to the next page (via the.click() function), sometimes the program continues to scrape data from the previous page.

Below is the relevant code:

driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.testwebsite.com')

y = 0

while (y < 10):
    tempname = driver.find_element_by_xpath("//table[@id = 
    'T01']/tbody/tr[1]/td[2]").text

    driver.find_element_by_xpath("//button[@title = 'Next']").click()

    time.sleep(10)

    y += 1

As the code demonstrates, I tried to include a sleep timer of ten seconds, as I thought that perhaps the webpage just needed some extra time to load before the program continued to execute, however I was wrong.

I also tried to include an explicit wait whereby the program waits until the current webpage has a particular element on it before it continues to execute, as can be seen below:

WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.LINK_TEXT, 
'11')))

This did not work either, as the above will eventually time out since the element is never found, because the webdriver somehow still remains stuck on the previous page.

It is worth noting also that when watching the automated driver, the new pages are loaded, therefore the 'next' button command does work, it's just the issue that the data is not being scraped from the new pages.

The last point I would like to add is that it is normally the case that data is scraped as expected from the first and second pages, then from the third page onwards it simply continues to scrape the data from the second page.

I hope that I've explained my problem well enough, and would very much appreciate any help I could get. Thanks in advance.

chwd = driver.window_handles
driver.switch_to.window(chwd[-1])

You need to switch to another tab to scrape data in that page

Without telling us the precise URL with which you are having the problem (why not?), it becomes more difficult to test test these solutions. But the problem seems to be that each page has the same element that can be found with...

elem = driver.find_element_by_xpath("//table[@id = 'T01']/tbody/tr[1]/td[2]")

... and when you do driver.find_element_by_xpath("//button[@title = 'Next']").click() , you are not waiting long enough for the next page to load so instead of finding elem of the next page, you are re-finding the element on the current page. So what you need to do is to wait long enough for elements of the current page to become "stale":

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
import time

class WaitUntilElementIsStale:
    def __init__(self, *, driver=None, element=None, timeout=10):
        assert driver or element
        self.element = driver.find_element_by_tag_name('html') if element is None else element
        self.timeout = timeout

    def __enter__(self):
        return None

    def __exit__(self, exc_type, exc_value, exc_traceback):
        if exc_type is not None:
            return
        start_time = time.time()
        while time.time() < start_time + self.timeout:
            try:
                # poll the link with an arbitrary call
                self.element.find_elements_by_id("doesn't-matter")
            except StaleElementReferenceException:
                return
            time.sleep(0.1)
        raise Exception('Timeout waiting for page load')

y = 0
while (y < 10):
    tempname = driver.find_element_by_xpath("//table[@id ='T01']/tbody/tr[1]/td[2]").text

    button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@title = 'Next']")    
    with WaitUntilElementIsStale(element=button, timeout=60):
        button.click() # wait for button to be "stale" up to 60 seconds, increase timeout if necessary

    y += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM