简体   繁体   English

如何通过存储在 pandas dataframe 的直接链接下载所有文件

[英]How to download all the files through the direct links stored in the pandas dataframe using selenium and python

I have following pandas dataframe .我关注pandas dataframe

filename   Direct url
file1.pdf  www.abc.com/file1.pdf, www.abc.com/file3.pdf
file2.pdf  www.abc.com/file2.pdf

I want to download directly download these files using Python, Selenium on Firefox browser I have written following code我想在 Firefox 浏览器上使用 Python、Selenium 直接下载下载这些文件我写了以下代码

dl_dir = "path/to/dl/folder"

ff_profile = webdriver.FirefoxProfile()

ff_profile.set_preference("browser.download.folderList", 2)
ff_profile.set_preference("browser.download.manager.showWhenStarting", False)
ff_profile.set_preference("browser.download.dir", dl_dir)
ff_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/pf")

driver = webdriver.Firefox(ff_profile)

for index,row df.iterrows():
    dir_urls=df[df["filename"]==row["filename"]]["Direct url"].iat[0]
    url_lst=dir_urls.split(",")
    for i in url_lst:
        driver.get(i)


But I am getting timeout exception saying Timeout loading page , as I'm not trying to open any webpage here, I'm just trying auto download the file.但是我收到超时异常说Timeout loading page ,因为我不想在这里打开任何网页,我只是想自动下载文件。 Currently, I'm only able to download 1 file as it gives timeout exception after that.目前,我只能下载 1 个文件,因为之后会出现timeout exception How can I circumvent this and download all the files?我怎样才能绕过这个并下载所有文件?

I would suggest that you increase timeout delay:我建议你增加超时延迟:

driver.set_page_load_timeout(30)  # seconds

And you could use a try/except statement:您可以使用 try/except 语句:

try:
    for i in url_lst:
        driver.get(i)
except [exception]:
    continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM