简体   繁体   English

使用 boto3 完成 multipart_upload?

[英]Complete a multipart_upload with boto3?

Tried this:试过这个:

import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = "/temp/"
fileName = "bigFile.gz" # this happens to be a 5.9 Gig file
client = boto3.client('s3', region)
config = TransferConfig(
    multipart_threshold=4*1024, # number of bytes
    max_concurrency=10,
    num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file(path+fileName, 'bucket', 'key')

Result: 5.9 gig file on s3.结果:s3 上的 5.9 演出文件。 Doesn't seem to contain multiple parts.似乎没有包含多个部分。

I found this example , but part is not defined.我找到了这个例子,但part没有定义。

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                           , Key=key
                           , PartNumber=1
                           , UploadId=mpu['UploadId']
                           , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
    'Parts': [
        {
            'PartNumber': 1,
            'ETag': part['ETag']
        }
    ]
}

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                             , Key=key
                             , UploadId=mpu['UploadId']
                             , MultipartUpload=part_info)

Question: Does anyone know how to use the multipart upload with boto3?问题:有谁知道如何在boto3中使用分段上传?

I would advise you to use boto3.s3.transfer for this purpose. 为此,我建议您使用boto3.s3.transfer Here is an example:下面是一个例子:

import boto3


def upload_file(filename):
    session = boto3.Session()
    s3_client = session.client("s3")

    try:
        print("Uploading file: {}".format(filename))

        tc = boto3.s3.transfer.TransferConfig()
        t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)

        t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")

    except Exception as e:
        print("Error uploading: {}".format(e))

Your code was already correct.您的代码已经正确。 Indeed, a minimal example of a multipart upload just looks like this:实际上,分段上传的一个最小示例如下所示:

import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')

You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads.您不需要明确要求分段上传,也不需要使用 boto3 中与分段上传相关的任何低级函数。 Just call upload_file , and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).只需调用upload_file ,如果您的文件大小超过某个阈值(默认为 8MB),boto3 将自动使用分段上传。

You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:您似乎对 S3 中的最终结果并不明显由多个部分组成这一事实感到困惑:

Result: 5.9 gig file on s3.结果:s3 上的 5.9 演出文件。 Doesn't seem to contain multiple parts.似乎没有包含多个部分。

... but this is the expected outcome. ......但这是预期的结果。 The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.分段上传 API 的全部意义在于让您通过多个 HTTP 请求上传单个文件,并在 S3 中以单个对象结束。

As described in official boto3 documentation :官方 boto3 文档所述

The AWS SDK for Python automatically manages retries and multipart and non-multipart transfers.适用于 Python 的 AWS 开发工具包自动管理重试以及多部分和非多部分传输。

The management operations are performed by using reasonable default settings that are well-suited for most scenarios.管理操作是通过使用非常适合大多数场景的合理默认设置来执行的。

So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:因此,您需要做的只是设置所需的分段阈值,该阈值将指示 Python SDK 将自动处理分段上传的最小文件大小:

import boto3
from boto3.s3.transfer import TransferConfig

# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)

# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)

Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency :此外,您还可以通过设置max_concurrency使用多线程机制进行分段上传:

# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)

# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

And finally in case you want perform multipart upload in single thread just set use_threads=False :最后,如果您想在单线程中执行分段上传,只需设置use_threads=False

# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)

s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

Complete source code with explanation: Python S3 Multipart File Upload with Metadata and Progress Indicator完整的源代码和解释: Python S3 Multipart File Upload with Metadata and Progress Indicator

Why not use just the copy option in boto3?为什么不只使用 boto3 中的复制选项?

s3.copy(CopySource={
        'Bucket': sourceBucket,
        'Key': sourceKey}, 
    Bucket=targetBucket,
    Key=targetKey,
    ExtraArgs={'ACL': 'bucket-owner-full-control'})

There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs .有关于如何初始化 s3 对象的详细信息,以及此处boto3 docs可用的调用的更多选项。

In your code snippet, clearly should be part -> part1 in the dictionary.在您的代码片段中,显然应该是字典中的part -> part1 Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.通常,您会有多个部分(否则为什么要使用多部分上传),并且'Parts'列表将包含每个部分的元素。

You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/您可能还对处理 S3 的新 pythonic 接口感兴趣: http : //s3fs.readthedocs.org/en/latest/

copy from boto3 is a managed transfer which will perform a multipart copy in multiple threads if necessary.从 boto3 复制是一种托管传输,如有必要,它将在多个线程中执行多部分复制。

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy

This works with objects greater than 5Gb and I have already tested this.这适用于大于 5Gb 的对象,我已经对此进行了测试。

Change Part to Part1将零件更改为零件 1

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                       , Key=key
                       , PartNumber=1
                       , UploadId=mpu['UploadId']
                       , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
  'Parts': [
    {
        'PartNumber': 1,
        'ETag': part1['ETag']
    }
   ]
  }

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                         , Key=key
                         , UploadId=mpu['UploadId']
                         , MultipartUpload=part_info)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM