Python，基本問題：如何使用 urllib.request.urlretrieve 下載多個 url

Question

我有以下功能齊全的工作代碼：

import urllib.request
import zipfile

url = "http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop"
filename = "C:/test/archive.zip"
destinationPath = "C:/test"

urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

它會完美運行幾次，但由於我從中檢索文件的服務器有一些限制，一旦達到每日限制，我就會收到此錯誤：

Traceback (most recent call last):
  File "script.py", line 11, in <module>
    urllib.request.urlretrieve(url,filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1591, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

如何更改腳本，使其包含多個 url 的列表，而不是單個 url，並且腳本一直嘗試從列表中下載，直到成功，然后繼續解壓縮。 我只需要一個成功的下載。

很抱歉對 Python 很陌生，但我無法弄清楚這一點。 我假設我必須將變量更改為如下所示：

url = {
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
}

然后將此行更改為某種循環：

urllib.request.urlretrieve(url,filename)

Answer 1

您想將您的網址放在一個列表中，然后遍歷該列表並嘗試每一個。 您捕獲但忽略它們拋出的異常，並在成功后中斷循環。 嘗試這個：

import urllib.request
import zipfile

urls = ["http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop", "other url", "another url"]
filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urls:
    try:
        urllib.request.urlretrieve(url,filename)
        sourceZip = zipfile.ZipFile(filename, 'r')
        break
    except ValueError:
        pass

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

Answer 2

import urllib.request
import zipfile

urllist = ("http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop",
            "another",
            "yet another",
            "etc")

filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urllist:
    try:
        urllib.request.urlretrieve(url,filename)
    except ValueError:
        continue
    sourceZip = zipfile.ZipFile(filename, 'r')

    for name in sourceZip.namelist():
        sourceZip.extract(name, destinationPath)
    sourceZip.close()
    break

假設您只想嘗試一次，直到其中一個起作用，然后停止，這將起作用。

Answer 3

對於成熟的分布式任務，您可以查看 Celery及其重試機制Celery-retry

或者你可以看看Retry-decorator ，例如：

import time

# Retry decorator with exponential backoff
def retry(tries, delay=3, backoff=2):
  """Retries a function or method until it returns True.

  delay sets the initial delay, and backoff sets how much the delay should
  lengthen after each failure. backoff must be greater than 1, or else it
  isn't really a backoff. tries must be at least 0, and delay greater than
  0."""

  if backoff <= 1:
    raise ValueError("backoff must be greater than 1")

  tries = math.floor(tries)
  if tries < 0:
    raise ValueError("tries must be 0 or greater")

  if delay <= 0:
    raise ValueError("delay must be greater than 0")

  def deco_retry(f):
    def f_retry(*args, **kwargs):
      mtries, mdelay = tries, delay # make mutable

      rv = f(*args, **kwargs) # first attempt
      while mtries > 0:
        if rv == True: # Done on success
          return True

        mtries -= 1      # consume an attempt
        time.sleep(mdelay) # wait...
        mdelay *= backoff  # make future wait longer

        rv = f(*args, **kwargs) # Try again

      return False # Ran out of tries :-(

    return f_retry # true decorator -> decorated function
  return deco_retry  # @retry(arg[, ...]) -> true decorator

Answer 4

urls = [
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
]

for u in urls:
   urllib.request.urlretrieve(u,filename)
   ... rest of code ...

Python，基本問題：如何使用 urllib.request.urlretrieve 下載多個 url

問題描述

4 個解決方案

解決方案1
3 已采納 2011-08-10 19:16:32

解決方案2
2 2011-08-10 19:16:19

解決方案3
0 2011-08-10 19:15:22

解決方案4
0 2011-08-10 19:16:10

Python，基本問題：如何使用 urllib.request.urlretrieve 下載多個 url

問題描述

4 個解決方案

解決方案1 3 已采納 2011-08-10 19:16:32

解決方案2 2 2011-08-10 19:16:19

解決方案3 0 2011-08-10 19:15:22

解決方案4 0 2011-08-10 19:16:10

解決方案1
3 已采納 2011-08-10 19:16:32

解決方案2
2 2011-08-10 19:16:19

解決方案3
0 2011-08-10 19:15:22

解決方案4
0 2011-08-10 19:16:10