![](/img/trans.png)
[英]How to download all the files through the direct links stored in the pandas dataframe using selenium and python
[英]Python download files by links stored in csv
作為 Python (2.7) 的新手,我正在尋找下一個建議:
我有一個 csv 文件,其中存儲的 http 鏈接以一列逗號分隔。
http://example.com/file.pdf,
http://example.com/file.xls,
http://example.com/file.xlsx,
http://example.com/file.doc,
主要目的是遍歷所有這些鏈接,並通過它們以原始擴展名和名稱下載文件。
所以我的搜索結果和這里的幫助給了我下一個腳本:
import urllib2
import pandas as pd
links = pd.read_csv('links.csv', sep=',', header =(0))
url = links # I know this part wrong by don`n know how to do right
user_agent = 'Mozilla 5.0 (Windows 7; Win64; x64)'
file_name = "tessst" # here the files name by how to get their original names
u = urllib2.Request(url, headers = {'User-Agent' : user_agent})
req = urllib2.urlopen(u)
f = open(file_name, 'wb')
f.write(req.read())
f.close()
請任何幫助
PS 不確定大熊貓 - 也許 csv 更好?
如果我可以假設您的 CSV 文件僅為一列,包含鏈接,那么這將起作用。
import csv, sys
import requests
import urllib2
import os
filename = 'test.csv'
with open(filename, 'rb') as f:
reader = csv.reader(f)
try:
for row in reader:
if 'http' in row[0]:
#print row
rev = row[0][::-1]
i = rev.index('/')
tmp = rev[0:i]
#print tmp[::-1]
rq = urllib2.Request(row[0])
res = urllib2.urlopen(rq)
if not os.path.exists("./"+tmp[::-1]):
pdf = open("./" + tmp[::-1], 'wb')
pdf.write(res.read())
pdf.close()
else:
print "file: ", tmp[::-1], "already exist"
except csv.Error as e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.