如何使用 boto3 和 Python 從 url 將 stream 非常大的文件轉換為 s3？

Question

我想將 url 中的 stream 大文件直接（通過分割成更小的部分）指向數據所在的服務器，指向 AWS 上的 S3 存儲桶。 我想這樣做，以避免在將執行此操作的臨時 EC2 實例上保存非常大的文件。 目前我嘗試通過以下方式做到這一點：

（目前為了測試，我下載了較小的示例 csv 文件）

link = "https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2021-financial-year-provisional/Download-data/annual-enterprise-survey-2021-financial-year-provisional-csv.csv"
session = requests.Session()
response = session.get(link)
s3_bucket = "my-bucket-name"
s3_file_path = "file-path-to-my-file/data1.csv"
s3 = boto3.client('s3')
response.raw.decode_content = True
conf = boto3.s3.transfer.TransferConfig(multipart_threshold=2, max_concurrency=4)
s3.upload_fileobj(response.raw, s3_bucket, s3_file_path, Config=conf)

不幸的是，當我執行此代碼時，會在 S3 上創建一個文件，但它包含 0 個字節的數據。 有人會善意地指出正確解決方案的道路嗎？

Answer 1

您的代碼正在使用：

session = requests.Session()
response = session.get(link)

返回： <class 'requests.models.Response'>

然后它使用：

response.raw.decode_content = True
...response.raw...

但是， response.raw的 output 的類型是<class 'urllib3.response.HTTPResponse'>

相反，它應該使用response.content ，它將返回請求的內容。

提示：調試時打印type(response.raw)之類的內容會很方便。

如何使用 boto3 和 Python 從 url 將 stream 非常大的文件轉換為 s3？

問題描述

1 個解決方案

解決方案1
0 2022-09-02 00:23:06

如何使用 boto3 和 Python 從 url 將 stream 非常大的文件轉換為 s3？

問題描述

1 個解決方案

解決方案1 0 2022-09-02 00:23:06

解決方案1
0 2022-09-02 00:23:06