简体   繁体   English

解析链接时无法继续单击下一页按钮

[英]Can't keep clicking on the next page button while parsing the links

I've written a script in python in combination with selenium to click on the search button to populate results and then parse different links from class ya_result-item from its landing page and then keep clicking on the next page button while parsing the links from other pages until there is no more button left to click. 我用python与硒结合编写了一个脚本,单击“ search按钮以填充结果,然后从其登录页面解析ya_result-item类的不同链接,然后继续单击下一页按钮,同时解析其他链接直到没有更多按钮可点击为止。

However, my script can parse the links from it's first page only and click on the next page button once but then gets stuck. 但是,我的脚本只能解析第一页中的链接,然后单击下一页按钮一次,但随后卡住了。

website link 网站连结

How can I make my script keep clicking on the next page button while parsing the links? 在解析链接时,如何使脚本继续单击下一页按钮?

This is my try: 这是我的尝试:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.yogaalliance.org/Directory?Type=School"

def get_page_content(driver,link):
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a.ya_directory-search-button"))).click()
    while True:
        for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^='ya_result-item'] a[href^='/SchoolPublicProfile']"))):
            print(item.get_attribute("href"))

        try:
            wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).click()
            wait.until(EC.staleness_of(item))
        except Exception:break

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    get_page_content(driver,url)

I printed out the Exception and it says that the Element is not clickable. 我打印出异常,它说元素不可单击。 Instead of clicking on it, an alternative is to use send_keys("\\n") to emulate the link click. 除了点击它,还可以使用send_keys("\\n")模拟链接点击。

wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[title*='next page']"))).send_keys("\n")

I tried this out and I am able to navigate to all the pages. 我尝试了一下,然后可以导航到所有页面。

If you want to scraping data no need for Selenium. 如果您想抓取数据,则无需硒。 You can use requests package to get all information in json format much faster. 您可以使用requests包更快地获取json格式的所有信息。

Code below collect all schools with details as list of maps in result : 下面的代码收集所有学校的详细信息,作为result的地图列表:

import requests

data = {
    'take': '10',
    'skip': '0',
    'page': '1',
    'pageSize': '10',
    'pageIndex': '0'
}
url = 'https://www.yogaalliance.org/DesktopModules/YAServices/API/SchoolDirectory/SearchSchools'
response = requests.post(url, data=data)

result = response.json()["Result"]
totalCount = response.json()["TotalCount"]
totalCount = int(totalCount / 10)

for i in range(1, totalCount):
    data['skip'] = int(data['skip']) + 10
    data['page'] = i + 1
    data['pageIndex'] = i
    response = requests.post(url, data=data)
    result.extend(response.json()["Result"])

print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM