Can't keep clicking on the next page button while parsing the links

Question

I've written a script in python in combination with selenium to click on the search button to populate results and then parse different links from class ya_result-item from its landing page and then keep clicking on the next page button while parsing the links from other pages until there is no more button left to click.

However, my script can parse the links from it's first page only and click on the next page button once but then gets stuck.

website link

How can I make my script keep clicking on the next page button while parsing the links?

This is my try:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.yogaalliance.org/Directory?Type=School"

def get_page_content(driver,link):
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a.ya_directory-search-button"))).click()
    while True:
        for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^='ya_result-item'] a[href^='/SchoolPublicProfile']"))):
            print(item.get_attribute("href"))

        try:
            wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).click()
            wait.until(EC.staleness_of(item))
        except Exception:break

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    get_page_content(driver,url)

Answer 1

I printed out the Exception and it says that the Element is not clickable. Instead of clicking on it, an alternative is to use send_keys("\\n") to emulate the link click.

wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).send_keys("\n")

I tried this out and I am able to navigate to all the pages.

Answer 2

If you want to scraping data no need for Selenium. You can use requests package to get all information in json format much faster.

Code below collect all schools with details as list of maps in result :

import requests

data = {
    'take': '10',
    'skip': '0',
    'page': '1',
    'pageSize': '10',
    'pageIndex': '0'
}
url = 'https://www.yogaalliance.org/DesktopModules/YAServices/API/SchoolDirectory/SearchSchools'
response = requests.post(url, data=data)

result = response.json()["Result"]
totalCount = response.json()["TotalCount"]
totalCount = int(totalCount / 10)

for i in range(1, totalCount):
    data['skip'] = int(data['skip']) + 10
    data['page'] = i + 1
    data['pageIndex'] = i
    response = requests.post(url, data=data)
    result.extend(response.json()["Result"])

print(result)

Can't keep clicking on the next page button while parsing the links

Question

2 answers

solution1
2 ACCPTED 2019-02-22 18:13:36

solution2
2 2019-02-22 18:22:48

Can't keep clicking on the next page button while parsing the links

Question

2 answers

solution1 2 ACCPTED 2019-02-22 18:13:36

solution2 2 2019-02-22 18:22:48

solution1
2 ACCPTED 2019-02-22 18:13:36

solution2
2 2019-02-22 18:22:48