简体   繁体   中英

Can't keep clicking on the next page button while parsing the links

I've written a script in python in combination with selenium to click on the search button to populate results and then parse different links from class ya_result-item from its landing page and then keep clicking on the next page button while parsing the links from other pages until there is no more button left to click.

However, my script can parse the links from it's first page only and click on the next page button once but then gets stuck.

website link

How can I make my script keep clicking on the next page button while parsing the links?

This is my try:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.yogaalliance.org/Directory?Type=School"

def get_page_content(driver,link):
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a.ya_directory-search-button"))).click()
    while True:
        for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^='ya_result-item'] a[href^='/SchoolPublicProfile']"))):
            print(item.get_attribute("href"))

        try:
            wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).click()
            wait.until(EC.staleness_of(item))
        except Exception:break

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    get_page_content(driver,url)

I printed out the Exception and it says that the Element is not clickable. Instead of clicking on it, an alternative is to use send_keys("\\n") to emulate the link click.

wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).send_keys("\n")

I tried this out and I am able to navigate to all the pages.

If you want to scraping data no need for Selenium. You can use requests package to get all information in json format much faster.

Code below collect all schools with details as list of maps in result :

import requests

data = {
    'take': '10',
    'skip': '0',
    'page': '1',
    'pageSize': '10',
    'pageIndex': '0'
}
url = 'https://www.yogaalliance.org/DesktopModules/YAServices/API/SchoolDirectory/SearchSchools'
response = requests.post(url, data=data)

result = response.json()["Result"]
totalCount = response.json()["TotalCount"]
totalCount = int(totalCount / 10)

for i in range(1, totalCount):
    data['skip'] = int(data['skip']) + 10
    data['page'] = i + 1
    data['pageIndex'] = i
    response = requests.post(url, data=data)
    result.extend(response.json()["Result"])

print(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM