我應該從“urllib.request.urlretrieve(..)”切換到“urllib.request.urlopen(..)”嗎？

Question

1. 棄用問題

在Python 3.7 中，我使用urllib.request.urlretrieve(..)函數從URL下載一個大文件。 在文檔（ https://docs.python.org/3/library/urllib.request.html ）中，我在urllib.request.urlretrieve(..)文檔上方閱讀了以下內容：

傳統界面
以下函數和類是從 Python 2 模塊 urllib（與 urllib2 相對）移植的。 它們可能會在未來的某個時候被棄用。

2. 尋找替代品

為了保持我的代碼面向未來，我正在尋找替代方案。 官方 Python 文檔沒有提到具體的文檔，但看起來urllib.request.urlopen(..)是最直接的候選者。 它位於文檔頁面的頂部。

不幸的是，替代方案 - 像urlopen(..) -不提供reporthook參數。 此參數是您傳遞給urlretrieve(..)函數的可調用urlretrieve(..) 。 反過來， urlretrieve(..)使用以下參數定期調用它：

塊編號
塊大小
總文件大小

我用它來更新進度條。 這就是為什么我錯過了替代方案中的reporthook論點。

3. urlretrieve(..) 與 urlopen(..)

我發現urlretrieve(..)只是使用urlopen(..) 。 參見Python 3.7安裝中的request.py代碼文件（Python37/Lib/urllib/request.py）：

_url_tempfiles = []
def urlretrieve(url, filename=None, reporthook=None, data=None):
    """
    Retrieve a URL into a temporary location on disk.

    Requires a URL argument. If a filename is passed, it is used as
    the temporary file location. The reporthook argument should be
    a callable that accepts a block number, a read size, and the
    total file size of the URL target. The data argument should be
    valid URL encoded data.

    If a filename is passed and the URL points to a local resource,
    the result is a copy from local file to new file.

    Returns a tuple containing the path to the newly created
    data file as well as the resulting HTTPMessage object.
    """
    url_type, path = splittype(url)

    with contextlib.closing(urlopen(url, data)) as fp:
        headers = fp.info()

        # Just return the local path and the "headers" for file://
        # URLs. No sense in performing a copy unless requested.
        if url_type == "file" and not filename:
            return os.path.normpath(path), headers

        # Handle temporary file setup.
        if filename:
            tfp = open(filename, 'wb')
        else:
            tfp = tempfile.NamedTemporaryFile(delete=False)
            filename = tfp.name
            _url_tempfiles.append(filename)

        with tfp:
            result = filename, headers
            bs = 1024*8
            size = -1
            read = 0
            blocknum = 0
            if "content-length" in headers:
                size = int(headers["Content-Length"])

            if reporthook:
                reporthook(blocknum, bs, size)

            while True:
                block = fp.read(bs)
                if not block:
                    break
                read += len(block)
                tfp.write(block)
                blocknum += 1
                if reporthook:
                    reporthook(blocknum, bs, size)

    if size >= 0 and read < size:
        raise ContentTooShortError(
            "retrieval incomplete: got only %i out of %i bytes"
            % (read, size), result)

    return result

4。結論

從這一切中，我看到了三個可能的決定：

我保持我的代碼不變。 讓我們希望urlretrieve(..)函數不會很快被棄用。
我給自己寫了一個替換函數，在外部表現得像urlretrieve(..) ，在內部使用urlopen(..) 。 實際上，這樣的函數就是上面代碼的復制粘貼。 這樣做感覺不干凈 - 與使用官方urlretrieve(..) 。
我給自己寫了一個替換函數，在外部表現得像urlretrieve(..) ，而在內部使用完全不同的東西。 但是，嘿，我為什么要這樣做？ urlopen(..)沒有被棄用，為什么不使用它呢？

你會做出什么決定？

Answer 1

以下示例使用urllib.request.urlopen從糧農組織統計數據庫下載包含大洋洲作物生產數據的 zip 文件。 在那個例子中，有必要定義一個最小的標題，否則 FAOSTAT 會拋出一個Error 403: Forbidden 。

import shutil
import urllib.request
import tempfile

# Create a request object with URL and headers    
url = “http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_Livestock_E_Oceania.zip”
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '}
req = urllib.request.Request(url=url, headers=header)

# Define the destination file
dest_file = tempfile.gettempdir() + '/' + 'crop.zip'
print(f“File located at:{dest_file}”)

# Create an http response object
with urllib.request.urlopen(req) as response:
    # Create a file object
    with open(dest_file, "wb") as f:
        # Copy the binary content of the response to the file
        shutil.copyfileobj(response, f)

基於https://stackoverflow.com/a/48691447/2641825的請求部分和https://stackoverflow.com/a/66591873/2641825頁眉部分，也看到urllib的文檔在HTTPS：//docs.python .org/3/howto/urllib2.html

我應該從“urllib.request.urlretrieve(..)”切換到“urllib.request.urlopen(..)”嗎？

問題描述

1. 棄用問題

2. 尋找替代品

3. urlretrieve(..) 與 urlopen(..)

4。結論

1 個解決方案

解決方案1
1 已采納 2021-08-16 14:54:45

我應該從“urllib.request.urlretrieve(..)”切換到“urllib.request.urlopen(..)”嗎？

問題描述

1. 棄用問題

2. 尋找替代品

3. urlretrieve(..) 與 urlopen(..)

4。結論

1 個解決方案

解決方案1 1 已采納 2021-08-16 14:54:45

解決方案1
1 已采納 2021-08-16 14:54:45