简体   繁体   English

Python,基本问题:如何使用 urllib.request.urlretrieve 下载多个 url

[英]Python, basic question: How do I download multiple url's with urllib.request.urlretrieve

I have the following fully functional, working code:我有以下功能齐全的工作代码:

import urllib.request
import zipfile

url = "http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop"
filename = "C:/test/archive.zip"
destinationPath = "C:/test"

urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

It will work perfect a few times, but because the server I am retrieving the file from has some limits, I get this error once I reach that daily limit:它会完美运行几次,但由于我从中检索文件的服务器有一些限制,一旦达到每日限制,我就会收到此错误:

Traceback (most recent call last):
  File "script.py", line 11, in <module>
    urllib.request.urlretrieve(url,filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1591, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

How do I alter the script, so that it includes a list of multiple url's, instead of one single url, and the script keeps trying to download from the list until one succeeds, and then continues with the unzip.如何更改脚本,使其包含多个 url 的列表,而不是单个 url,并且脚本一直尝试从列表中下载,直到成功,然后继续解压缩。 I just need one successful download.我只需要一个成功的下载。

Apologies for being very new to Python but I can't figure this one out.很抱歉对 Python 很陌生,但我无法弄清楚这一点。 I'm assuming I have to change the variable to look something like this:我假设我必须将变量更改为如下所示:

url = {
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
}

and then changing this line into some sort of loop:然后将此行更改为某种循环:

urllib.request.urlretrieve(url,filename)

You want to put your urls in a list, then loop through that list and try each one.您想将您的网址放在一个列表中,然后遍历该列表并尝试每一个。 You catch but ignore exceptions they throw, and break the loop once one succeeds.您捕获但忽略它们抛出的异常,并在成功后中断循环。 Try this:尝试这个:

import urllib.request
import zipfile

urls = ["http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop", "other url", "another url"]
filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urls:
    try:
        urllib.request.urlretrieve(url,filename)
        sourceZip = zipfile.ZipFile(filename, 'r')
        break
    except ValueError:
        pass

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()
import urllib.request
import zipfile

urllist = ("http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop",
            "another",
            "yet another",
            "etc")

filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urllist:
    try:
        urllib.request.urlretrieve(url,filename)
    except ValueError:
        continue
    sourceZip = zipfile.ZipFile(filename, 'r')

    for name in sourceZip.namelist():
        sourceZip.extract(name, destinationPath)
    sourceZip.close()
    break

This will work assuming you just want to try them each once until one works, then stop.假设您只想尝试一次,直到其中一个起作用,然后停止,这将起作用。

For a full-fledged distributed tasks you can checkout Celery and their retry mechanism Celery-retry对于成熟的分布式任务,您可以查看 Celery及其重试机制Celery-retry

or you can have a look at Retry-decorator , Example:或者你可以看看Retry-decorator ,例如:

import time

# Retry decorator with exponential backoff
def retry(tries, delay=3, backoff=2):
  """Retries a function or method until it returns True.

  delay sets the initial delay, and backoff sets how much the delay should
  lengthen after each failure. backoff must be greater than 1, or else it
  isn't really a backoff. tries must be at least 0, and delay greater than
  0."""

  if backoff <= 1:
    raise ValueError("backoff must be greater than 1")

  tries = math.floor(tries)
  if tries < 0:
    raise ValueError("tries must be 0 or greater")

  if delay <= 0:
    raise ValueError("delay must be greater than 0")

  def deco_retry(f):
    def f_retry(*args, **kwargs):
      mtries, mdelay = tries, delay # make mutable

      rv = f(*args, **kwargs) # first attempt
      while mtries > 0:
        if rv == True: # Done on success
          return True

        mtries -= 1      # consume an attempt
        time.sleep(mdelay) # wait...
        mdelay *= backoff  # make future wait longer

        rv = f(*args, **kwargs) # Try again

      return False # Ran out of tries :-(

    return f_retry # true decorator -> decorated function
  return deco_retry  # @retry(arg[, ...]) -> true decorator
urls = [
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
]

for u in urls:
   urllib.request.urlretrieve(u,filename)
   ... rest of code ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python 2.7中使用urllib.request.urlretrieve - How can I use urllib.request.urlretrieve with python 2.7 尝试在Python中下载jpeg时出现urllib.request.urlretrieve错误 - urllib.request.urlretrieve ERROR trying to download jpeg in Python urllib.request.urlretrieve返回损坏的文件(如何处理这种网址?) - urllib.request.urlretrieve returns corrupt file (How to handle this kind of url?) urllib.request.urlretrieve 如果不返回会做什么 - What does urllib.request.urlretrieve do if not returned 无法在Python中使用“ urllib.request.urlretrieve”下载图像 - failing at downloading an image with “urllib.request.urlretrieve” in Python 单元测试模拟 urllib.request.urlretrieve() Python 3 和内部函数 - Unit test mock urllib.request.urlretrieve() Python 3 and internal function 如何在urllib.request.urlretrieve中添加标头以保留变量? - How can I add a header to urllib.request.urlretrieve keeping my variables? 我应该从“urllib.request.urlretrieve(..)”切换到“urllib.request.urlopen(..)”吗? - Should I switch from "urllib.request.urlretrieve(..)" to "urllib.request.urlopen(..)"? Python:urllib.request.urlretrieve保存一个空文件。 在其中写道“提供的id参数为空。” - Python: urllib.request.urlretrieve saves an empty file. Writes in it “Supplied id parameter is empty.” 我们如何从通过 urllib.request.urlretrieve 获取的 csv 中删除标头 - How can we remove header from csv being fetched via urllib.request.urlretrieve
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM