简体繁体中英

Amazon S3 - multipart upload vs split files-then-upload

原文 2018-04-06 15:02:57 9 1 python/ amazon-s3/ parallel-processing/ boto/ boto3

I am currently trying to upload files from local to S3 using python. I have extremely large files (over 10 GB) and when I went through some best practices for faster upload, I came across multipart upload. If I understood rightly, multipart upload does the below things:

Split the file into a number of chunks.
Upload each of these chunks to S3 (either serially or in parallel based on our code).
Once the upload of each of these chunks are over, S3 takes care of the final assembling of individual chunks into a single final object/file.

Since, after the uploads of all the chunks are over, it is obvious that multipart upload assembles everything into a single object. But, I want to keep the individual parts as it is or find another way to split the files and upload using python boto's put_object method. This is because, I want the individual chunks/parts of the file to be read in parallel from S3 for my further processing. Is there a way to do this or should I stick to the traditional way of splitting the file by myself and uploading them in parallel (for faster upload).

Thanks in advance.

1 answers

We had the same problem and here is the approach we took.

Enable Transfer Acceleration

to your bucket.

https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html

If your upload bandwidth is limited, there is no point in splitting the files.

If you have enormous upload bandwidth and your single accelerated endpoint is not consuming the whole upload bandwidth, you can split the files and upload them with multipart.

Upload a single S3 Object/File with multiparts:

A detailed instruction is covered in the following link.

https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/

Create Multipart Upload:

aws s3api create-multipart-upload --bucket multirecv --key testfile --metadata md5= mvhFZXpr7J5u0ooXDoZ/4Q==

Upload File Parts:

aws s3api upload-part --bucket multirecv --key testfile --part-number 1 --body testfile.001 --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk --content-md5 Vuoo2L6aAmjr+4sRXUwf0w==

Complete Upload:

aws s3api list-parts --bucket multirecv --key testfile --upload-id sDCDOJiTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

Hope it helps.

EDIT1

Partial Read from S3:

With S3 you don't need to read the full object. You can specify the start range and end range of the object. You don't need to maintain the splits in S3. You can maintain as single object. Below command will help you to read it partially.

One more benefit is, you can read them parallely as well.

aws s3api get-object --bucket my_bucket --key object/location/file.txt file1.range-1000-2000.txt --range bytes=1000-2000

Amazon S3 file upload

S3 multipart upload - complete multipart upload asyncronously

How to upload small files to Amazon S3 efficiently in Python

How to directly upload files from google drive to amazon s3?

multipart upload to S3 with Django-storages

How to read a part of amazon s3 key, assuming that “multipart upload complete” is yet to happen for that key?

S3 Python - Multipart upload to s3 with presigned part urls

Upload Images to Amazon S3 using Django

upload file using flask to amazon s3

create mongodb backup and upload to amazon s3

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Amazon S3 file upload S3 multipart upload - complete multipart upload asyncronously How to upload small files to Amazon S3 efficiently in Python How to directly upload files from google drive to amazon s3? multipart upload to S3 with Django-storages How to read a part of amazon s3 key, assuming that “multipart upload complete” is yet to happen for that key? S3 Python - Multipart upload to s3 with presigned part urls Upload Images to Amazon S3 using Django upload file using flask to amazon s3 create mongodb backup and upload to amazon s3

Related Tags

Amazon S3 - multipart upload vs split files-then-upload

Question

1 answers

solution1 3 2018-04-06 21:15:39

solution1
3 2018-04-06 21:15:39