[英]How to download all the files through the direct links stored in the pandas dataframe using selenium and python
I have following pandas dataframe .我关注pandas dataframe 。
filename Direct url
file1.pdf www.abc.com/file1.pdf, www.abc.com/file3.pdf
file2.pdf www.abc.com/file2.pdf
I want to download directly download these files using Python, Selenium on Firefox browser I have written following code我想在 Firefox 浏览器上使用 Python、Selenium 直接下载下载这些文件我写了以下代码
dl_dir = "path/to/dl/folder"
ff_profile = webdriver.FirefoxProfile()
ff_profile.set_preference("browser.download.folderList", 2)
ff_profile.set_preference("browser.download.manager.showWhenStarting", False)
ff_profile.set_preference("browser.download.dir", dl_dir)
ff_profile.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/pf")
driver = webdriver.Firefox(ff_profile)
for index,row df.iterrows():
dir_urls=df[df["filename"]==row["filename"]]["Direct url"].iat[0]
url_lst=dir_urls.split(",")
for i in url_lst:
driver.get(i)
But I am getting timeout exception saying Timeout loading page
, as I'm not trying to open any webpage here, I'm just trying auto download the file.但是我收到超时异常说Timeout loading page
,因为我不想在这里打开任何网页,我只是想自动下载文件。 Currently, I'm only able to download 1 file as it gives timeout exception
after that.目前,我只能下载 1 个文件,因为之后会出现timeout exception
。 How can I circumvent this and download all the files?我怎样才能绕过这个并下载所有文件?
I would suggest that you increase timeout delay:我建议你增加超时延迟:
driver.set_page_load_timeout(30) # seconds
And you could use a try/except statement:您可以使用 try/except 语句:
try:
for i in url_lst:
driver.get(i)
except [exception]:
continue
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.