[英]Boto3 multipart upload and md5 checking
Is there a boto3 function to upload a file to S3 that verifies the MD5 checksum after upload and takes care of multipart uploads and other concurrency issues?是否有 boto3 function 将文件上传到 S3 以在上传后验证 MD5 校验和并处理分段上传和其他并发问题?
According to the documentation, upload_file takes care of multipart uploads and put_object can check the MD5 sum.根据文档,upload_file 负责分段上传,而 put_object 可以检查 MD5 总和。 Is there a way for me to do both without writing a long function of my own?
我有没有办法在不写我自己的长 function 的情况下做到这两点? Awscli is based on boto3 and it does that ( https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html ) but I'm not sure about boto3 itself.
Awscli 基于 boto3 并且它做到了( https://docs.aws.amazon.com/cli/latest/topic/s3-faq.html )但我不确定 boto3 本身。
As far as I know, there is no native way in boto3 to do a multi-part upload and then easily compare md5 hashes.据我所知,boto3 中没有本地方法可以进行多部分上传,然后轻松比较 md5 哈希值。 The answer here is to either use aws-cli or something like the code below if you want to stick with boto3 and multi-part upload (please note, this is a rough example, not production code):
如果您想坚持使用 boto3 和多部分上传,这里的答案是使用 aws-cli 或类似下面的代码(请注意,这是一个粗略的示例,而不是生产代码):
import boto3
import hashlib
from botocore.exceptions import ClientError
from botocore.client import Config
from boto3.s3.transfer import TransferConfig
chunk_size=8 * 1024 * 1024
# This function is a re-worked function taken from here: https://stackoverflow.com/questions/43794838/multipart-upload-to-s3-with-hash-verification
# Credits to user: https://stackoverflow.com/users/518169/hyperknot
def calculate_s3_etag(file_path, chunk_size=chunk_size):
chunk_md5s = []
with open(file_path, 'rb') as fp:
while True:
data = fp.read(chunk_size)
if not data:
break
chunk_md5s.append(hashlib.md5(data))
num_hashes = len(chunk_md5s)
if not num_hashes:
# do whatever you want to do here
raise ValueError
if num_hashes == 1:
return f"{chunk_md5s[0].hexdigest()}"
digest_byte_string = b''.join(m.digest() for m in chunk_md5s)
digests_md5 = hashlib.md5(digest_byte_string)
return f"{digests_md5.hexdigest()}-{num_hashes}"
def s3_md5sum(bucket_name, resource_name, client):
try:
return client.head_object(
Bucket=bucket_name,
Key=resource_name
)['ETag'][1:-1]
except ClientError:
# do whatever you want to do here
raise ClientError
bucket = "<INSERT_BUCKET_NAME>"
file = "<INSERT_FILE_NAME>"
aws_region = "<INSERT_REGION>"
aws_credentials = {
"aws_access_key_id": "<INSERT_ACCESS_KEY>",
"aws_secret_access_key": "<INSERT_SECRET_KEY>",
}
client = boto3.client(
"s3", config=Config(region_name=aws_region), **aws_credentials
)
transfer_config = TransferConfig(multipart_chunksize=chunk_size)
client.upload_file(file, bucket, file, Config=transfer_config)
tag = calculate_s3_etag(file)
result = s3_md5sum(bucket, file, client)
assert tag == result
Explanation:解释:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.