简体   繁体   English

如何使用硒从网站上抓取一些链接

[英]How to scrape some links from a website using selenium

I've been trying to parse the links ended with 20012019.csv from a webpage using the below script but the thing is I'm always having timeout exception error. 我一直在尝试使用以下脚本从网页中解析以20012019.csv的链接,但问题是我总是遇到timeout exception错误。 It occurred to me that I did things in the right way. 我想到我以正确的方式做事。

However, any insight as to where I'm going wrong will be highly appreciated. 但是,对于我要去哪里的任何见解将受到高度赞赏。

My attempt so far: 到目前为止我的尝试:

from selenium import webdriver

url = 'https://promo.betfair.com/betfairsp/prices'

def get_info(driver,link):
    driver.get(link)
    for item in driver.find_elements_by_css_selector("a[href$='20012019.csv']"):
        print(item.get_attribute("href"))

if __name__ == '__main__':
    driver = webdriver.Chrome()
    try:
        get_info(driver,url)
    finally:
        driver.quit()

Your code is fine (tried it and it works), the reason you get a timeout is because the default timeout is 60s according to this answer and the page is huge. 您的代码很好(尝试了它并且可以工作),得到超时的原因是因为根据此答案 ,默认超时为60s,并且页面很大。

Add this to your code before making the get request (to wait 180s before timeout): 在发出get请求之前将其添加到您的代码中(超时之前要等待180秒):

driver.set_page_load_timeout(180)

You were close. 你近了 You have to induce WebDriverWait for the the visibility of all elements located and you need to change the line: 您必须诱使WebDriverWait获得所定位的所有元素可见性,并且需要更改该行:

for item in driver.find_elements_by_css_selector("a[href$='20012019.csv']"):

to: 至:

for item in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href$='20012019.csv']"))):

Note : You have to add the following imports : 注意 :您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM