在 python 中分塊下載文件？

Question

我正在編寫一個簡單的同步下載管理器，它分 10 個部分下載視頻文件。 我正在使用requests從標頭中獲取內容長度。 使用它，我在 10 中破壞和下載文件； 字節塊，然后將它們合並以形成完整的視頻。 下面的代碼假設以這種方式工作，但最終合並的文件只能工作幾秒鍾，之后它就會損壞。 我的代碼有什么問題？

import requests
import os

def intervals(parts, duration):
    part_duration = duration // parts
    return [(i * part_duration, (i + 1) * part_duration) for i in range(parts)]

home = os.path.expanduser("~")
if not os.path.exists(home+'/Desktop/temp'):
    os.makedirs(home+'/Desktop/temp')

PATH = home+"/Desktop/temp/tmp.mp4"

example_file_url = "https://file-examples-com.github.io/uploads/2017/04/file_example_MP4_1280_10MG.mp4"


req = requests.head(example_file_url)

size = int(req.headers['Content-Length'])

content_section = 10

section_intervals = intervals(content_section,size)


with  open(PATH, "wb") as file:
    for i,(start,end) in enumerate(section_intervals):
        headers = {"Range": "bytes="+str(start)+"-"+str(end)}
        print(headers)
        r = requests.get(example_file_url, headers=headers)
        file.write(r.content)

Answer 1

問題

您的范圍是錯誤的，因為Range header 指定的間隔給出了第一個和最后一個偏移量，例如bytes=0-10表示從 0 到 10 的 11 個字節（與切片在 python 中的工作方式不同），所以bytes=0-10和bytes=10-20是重疊范圍。 例如，您需要bytes=0-9后跟bytes=10-19 。

請參閱本文檔中的示例：

header 請求前 1024 個字節... Range: bytes=0-1023

（而 python 切片中的[0:1023]長度為 1023）。

你說它“工作幾秒鍾，然后被破壞”，我假設你的意思是它在解碼的 MP4 output 的前幾秒內有效。 它中斷的點將是第一個下載部分的結尾，其中第一部分的最后一個字節在第二部分的開頭重復。

另一個問題是您的總長度是錯誤的，因為您將 integer 除以parts ，然后當您再次乘以時，您已經丟失了最后的小數部分。

修復

將intervals function 更改為此，它可以工作：

import math

def intervals(parts, duration):
    part_duration = math.ceil(duration / parts)
    return [(start, min(start + part_duration - 1, duration - 1)) 
             for start in range(0, duration, part_duration)]

檢查范圍

插入打印語句：

print("Size = ", size)
print(section_intervals)

現在給出：

Size =  9840497
[(0, 984049), (984050, 1968099), (1968100, 2952149), (2952150, 3936199), (3936200, 4920249), (4920250, 5904299), (5904300, 6888349), (6888350, 7872399), (7872400, 8856449), (8856450, 9840496)]

而使用您的原始intervals function，它給出：

Size =  9840497
[(0, 984049), (984049, 1968098), (1968098, 2952147), (2952147, 3936196), (3936196, 4920245), (4920245, 5904294), (5904294, 6888343), (6888343, 7872392), (7872392, 8856441), (8856441, 9840490)]

請注意重疊范圍和末尾缺少的字節。

使用 md5sum 驗證 output

最后我們可以通過計算校驗和來驗證下載。 在此示例中，我使用來自 Linux 命令行的md5sum （盡管cksum也可以工作，因為不需要加密校驗和）。

我稱 output myoutput 。

$ md5sum myoutput
10c918b1d01aea85864ee65d9e0c2305  myoutput

現在我也直接用wget <url>下載了一個副本，並看到它具有相同的校驗和。

$ wget https://file-examples-com.github.io/uploads/2017/04/file_example_MP4_1280_10MG.mp4
--2020-07-21 08:26:52--  https://file-examples-com.github.io/uploads/2017/04/file_example_MP4_1280_10MG.mp4

$ md5sum file_example_MP4_1280_10MG.mp4 
10c918b1d01aea85864ee65d9e0c2305  file_example_MP4_1280_10MG.mp4

在 python 中分塊下載文件？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-07-21 07:18:01

問題

修復

檢查范圍

使用 md5sum 驗證 output

在 python 中分塊下載文件？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-07-21 07:18:01

問題

修復

檢查范圍

使用 md5sum 驗證 output

解決方案1
1 已采納 2020-07-21 07:18:01