简体   繁体   English

Python通过存储在csv中的链接下载文件

[英]Python download files by links stored in csv

As a newbie in Python (2.7) I`m looking for next suggestion:作为 Python (2.7) 的新手,我正在寻找下一个建议:

I have a csv file with stored http links in one column comma delimited.我有一个 csv 文件,其中存储的 http 链接以一列逗号分隔。

http://example.com/file.pdf,
http://example.com/file.xls,
http://example.com/file.xlsx,
http://example.com/file.doc,

The main aim is to loop through all these links and download files by them in original extention and name.主要目的是遍历所有这些链接,并通过它们以原始扩展名和名称下载文件。

So my search results and help here gave me next script:所以我的搜索结果和这里的帮助给了我下一个脚本:

import urllib2
import pandas as pd 

links = pd.read_csv('links.csv', sep=',', header =(0))

url = links                   # I know this part wrong by don`n know how to do right

user_agent = 'Mozilla 5.0 (Windows 7; Win64; x64)'

file_name = "tessst"          # here the files name by how to get their original names

u = urllib2.Request(url, headers = {'User-Agent' : user_agent})
req = urllib2.urlopen(u)
f = open(file_name, 'wb')
f.write(req.read())

f.close()

please any help请任何帮助

PS not sure about pandas - maybe csv better? PS 不确定大熊猫 - 也许 csv 更好?

If I can assume your CSV file to be one column only, containing links then this would work .如果我可以假设您的 CSV 文件仅为一列,包含链接,那么这将起作用。

import csv, sys
import requests
import urllib2
import os

filename = 'test.csv'
with open(filename, 'rb') as f:
    reader = csv.reader(f)
    try:
        for row in reader:
            if 'http' in row[0]:
                #print row
                rev  = row[0][::-1]
                i  = rev.index('/')
                tmp = rev[0:i]
                #print tmp[::-1]
                rq = urllib2.Request(row[0])
                res = urllib2.urlopen(rq)
                if not os.path.exists("./"+tmp[::-1]):                
                    pdf = open("./" + tmp[::-1], 'wb')
                    pdf.write(res.read())
                    pdf.close()
                else:
                    print "file: ", tmp[::-1], "already exist"
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM