[英]How to do multipart download of large files from S3 in python?
I am looking for some code in Python that allows me to do a multipart download of large files from S3. 我正在寻找Python中的一些代码,允许我从S3进行大型文件的多部分下载 。 I found this github page , but it is too complex with all the command line argument passing and parser and other things that are making it difficult for me to understand the code.
我找到了这个github页面 ,但它太复杂了,所有的命令行参数传递和解析器以及其他让我很难理解代码的东西。 I am not looking for anything fancy and want a basic code so that I can statically put 2-3 filenames into it and have it perform a multipart download of those files.
我不是在寻找任何花哨的东西,并且想要一个基本的代码,这样我就可以静态地将2-3个文件名放入其中并让它执行这些文件的多部分下载。
Can anyone provide me with such a solution or link to one? 任何人都可以为我提供这样的解决方案或链接吗? Or maybe help me in cleaning the code in the link I posted above?
或者也许帮我清理上面发布的链接中的代码?
This is old but here is what I did to get this to work: 这是旧的,但这是我做的工作:
conn.download_file(
Bucket=bucket,
Filename=key.split("/")[-1],
Key=key,
Config=boto3.s3.transfer.TransferConfig(
max_concurrency=parallel_threads
)
)
Here is how I used it in some nice visual code: 以下是我在一些不错的可视代码中使用它的方法:
import boto3
import math
import os
import time
def s3_get_meta_data(conn, bucket, key):
meta_data = conn.head_object(
Bucket=bucket,
Key=key
)
return meta_data
def s3_download(conn, bucket, key, parallel_threads):
start = time.time()
md = s3_get_meta_data(conn, bucket, key)
chunk = get_cunks(md["ContentLength"], parallel_threads)
print("Making %s parallel s3 calls with a chunk size of %s each..." % (
parallel_threads, convert_size(chunk))
)
cur_dir = os.path.dirname(os.path.realpath(__file__))
conn.download_file(
Bucket=bucket,
Filename=key.split("/")[-1],
Key=key,
Config=boto3.s3.transfer.TransferConfig(
max_concurrency=parallel_threads
)
)
end = time.time() - start
print("Finished downloading %s in %s seconds" % (key, end))
def convert_size(size_bytes):
if size_bytes == 0:
return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size_bytes, 1024)))
p = math.pow(1024, i)
s = round(size_bytes / p, 2)
return "%s %s" % (s, size_name[i])
def get_cunks(size_bytes, desired_sections):
return size_bytes / desired_sections
session = boto3.Session(profile_name="my_profile")
conn = session.client("s3", region_name="us-west-2")
s3_download(
conn,
"my-bucket-name",
"my/key/path.zip",
5
)
More information can be supplied to the Config parameter, read about it in the aws docs: 可以向Config参数提供更多信息,在aws文档中阅读:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.