如何通过存储在 pandas dataframe 的直接链接下载所有文件

Question

I have following pandas dataframe .我关注pandas dataframe 。

filename   Direct url
file1.pdf  www.abc.com/file1.pdf, www.abc.com/file3.pdf
file2.pdf  www.abc.com/file2.pdf

I want to download directly download these files using Python, Selenium on Firefox browser I have written following code我想在 Firefox 浏览器上使用 Python、Selenium 直接下载下载这些文件我写了以下代码

dl_dir = "path/to/dl/folder"

ff_profile = webdriver.FirefoxProfile()

ff_profile.set_preference("browser.download.folderList", 2)
ff_profile.set_preference("browser.download.manager.showWhenStarting", False)
ff_profile.set_preference("browser.download.dir", dl_dir)
ff_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/pf")

driver = webdriver.Firefox(ff_profile)

for index,row df.iterrows():
    dir_urls=df[df["filename"]==row["filename"]]["Direct url"].iat[0]
    url_lst=dir_urls.split(",")
    for i in url_lst:
        driver.get(i)

But I am getting timeout exception saying Timeout loading page , as I'm not trying to open any webpage here, I'm just trying auto download the file.但是我收到超时异常说Timeout loading page ，因为我不想在这里打开任何网页，我只是想自动下载文件。 Currently, I'm only able to download 1 file as it gives timeout exception after that.目前，我只能下载 1 个文件，因为之后会出现timeout exception 。 How can I circumvent this and download all the files?我怎样才能绕过这个并下载所有文件？

Answer 1

I would suggest that you increase timeout delay:我建议你增加超时延迟：

driver.set_page_load_timeout(30)  # seconds

And you could use a try/except statement:您可以使用 try/except 语句：

try:
    for i in url_lst:
        driver.get(i)
except [exception]:
    continue

如何通过存储在 pandas dataframe 的直接链接下载所有文件

问题描述

1 个解决方案

解决方案1
0 2021-03-25 09:23:17

如何通过存储在 pandas dataframe 的直接链接下载所有文件

问题描述

1 个解决方案

解决方案1 0 2021-03-25 09:23:17

解决方案1
0 2021-03-25 09:23:17