使用WGET或Python從CSV下載並重命名附件，需要基本身份驗證

Question

我刮了一個正在使用的票務網站，現在有一個CSV文件，看起來像這樣：ID，Attachment_URL，Ticket_URL。 我現在需要做的是下載每個附件，並使用Ticket_URL重命名該文件。 我的主要問題是，導航到Attachment_URL時，必須使用基本身份驗證，然后將您重定向到aws s3鏈接。 我已經能夠使用wget下載單個文件，但無法遍歷整個列表（約35k行），而且我不確定如何將文件命名為ticket_id。 任何意見，將不勝感激。

Answer 1

得到它了。

要打開已認證的會話：

# -*- coding: utf-8 -*-
import requests
import re
from bs4 import BeautifulSoup
import csv
import pandas as pd
import time


s = requests.session()

payload = {
    'user': '',
    'pw': ''
}

s.post('login.url.here', data=payload)
for i in range(1, 6000):
    testURL = s.get(
        'https://urlhere.com/efw/stuff&page={}'.format(i))


    soup = BeautifulSoup(testURL.content)
    table = soup.find("table", {"class": "table-striped"})
    table_body = table.find('tbody')
    rows = table_body.find_all('tr')[1:]
    print "The current page is: " + str(i)

    for row in rows:
        cols = row.find_all('a', attrs={'href': re.compile("^/helpdesk/")})
      # time.sleep(1)
        with open('fd.csv', 'a') as f:
         writer = csv.writer(f)
         writer.writerow(cols)
         print cols
    print cols

然后，我清理了R中的鏈接並下載了文件。

#!  /usr/bin/env python
    import threading
    import os
    from time import gmtime, strftime
    from Queue import Queue

    import requests
    s = requests.session()

    payload = {
        'user': '',
        'pw': ''
    }
    s.post('login', data=payload)

    class log:

        def info(self, message):
            self.__message("info", message)
        def error(self, message):
            self.__message("error", message)
        def debug(self, message):
            self.__message("debug", message)
        def __message(self, log_level, message):
            date = strftime("%Y-%m-%d %H:%M:%S", gmtime())
            print "%s [%s] %s" % (date, log_level, message)


    class fetch:
        def __init__(self):
            self.temp_dir = "/tmp"


        def run_fetcher(self, queue):

            while not queue.empty():
                url, ticketid = queue.get()

                if ticketid.endswith("NA"):
                    fileName = url.split("/")[-1] + 'NoTicket'
                else:
                    fileName = ticketid.split("/")[-1]

                response = s.get(url)

                with open(os.path.join('/Users/Desktop/FolderHere', fileName + '.mp3'), 'wb') as f:

                     f.write(response.content)

                     print  fileName




                queue.task_done()


    if __name__ == '__main__':

        # load in classes
        q = Queue()
        log = log()
        fe = fetch()


        # get bucket name
        #Read in input file
        with open('/Users/name/csvfilehere.csv', 'r') as csvfile:
            for line in csvfile:
                id,url,ticket = line.split(",")
                q.put([url.strip(),ticket.strip()])

        # spin up fetcher workers
        threads = []
        for i in range(8):
            t = threading.Thread(target=fe.run_fetcher, args=(q,))
            t.daemon = True
            threads.append(t)
            t.start()

        # close threads
        [x.join() for x in threads]

        # close queue
        q.join()
        log.info("End")

使用WGET或Python從CSV下載並重命名附件，需要基本身份驗證

問題描述

1 個解決方案

解決方案1
0 已采納 2017-01-23 08:02:17

使用WGET或Python從CSV下載並重命名附件，需要基本身份驗證

問題描述

1 個解決方案

解決方案1 0 已采納 2017-01-23 08:02:17

解決方案1
0 已采納 2017-01-23 08:02:17