简体   繁体   中英

Uploading large files from web browser and transferring to Amazon S3

We currently have a small web app, part of which is file uploads. Currently we are using Plupload on the client with chunking enabled to allow large files be uploaded. The files are saved on the app server and the chunks are appended as they come up.

Now we are moving to Amazon S3 for file storage with the possiblity of multiple app servers. I'm finding it difficult how to handle these chunks. I was trying to follow their example , but I'm running into problems. The meat of what I'm trying looks like this:

UploadPartRequest uploadRequest = new UploadPartRequest()
    .withBucketName(bucket).withKey(key)
    .withUploadId(uploadId).withPartNumber(partNumber)
    .withPartSize(bytes.length)
    .withInputStream(new ByteArrayInputStream(bytes));

s3Client.uploadPart(uploadRequest);

The problem I'm having is that I need to somehow know the uploadId of the chunk. I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up? I thought I could perhaps send it down with the first response, and then send it back up with each chunk request. That didn't seem like too far of reach.

Then I found that in order to complete the upload I need a List<PartETag> with the PartETag s getting returned from each upload to Amazon S3. So, my next question was how do I save all of these PartETag s while the chunks are being uploaded from the browser? My first thought was I could send down the PartETag of each chunk in the response, and then store those client side. I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETag s. IF there's not, I'd just have to send up all the ones I have each time, and then only the last request would use them. This all seems to be a little hacky to me.

So, I'm thinking someone has to have dealt with this before. Is there a good, standard way of doing this?

I thought about constructing the file on the app server and then sending it over to S3, but with multiple app servers, the chunks aren't guaranteed to end up in the same place.

Another thought I've had is to store all this information in the database during the upload, but I wasn't sure I wanted to have to go hit the database with each chunk request. Are there any other options besides this?

I appreciate any help anyone can provide.

Try our IaaS solution:

https://uploadcare.com

It supports file size up to 5GB . Here is an article about a successful use case for uploading large files using our system:

https://community.skuidify.com/skuid/topics/how_to_upload_large_files_using_uploadcare_com

Correct me if I'm wrong, but as I understand your question your web servers act as proxies between the browser and the client.

The problem I'm having is that I need to somehow know the uploadId of the chunk. I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up?

On BeforeUpload you may add the uploadId as querystring parameter, as in this answer

My first thought was I could send down the PartETag of each chunk in the response, and then store those client side.

This seems a good idea, then altering the querystring as above on 'ChunkUploaded' to add the just received PartETag , thus transfering all previously received PartETag with each request. Not sure altering the querystring between chunks is possible, or if you can synchronously do some processing before upload of next chunk starts, but it is worth a try I would say.

I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETags.

This can be found in the php samples in the plupload download : two POST parameters are sent by plupload to the server

  • chunks : total number of chunks of the upload (0 if upload not chunked)
  • chunk : index of the current chunk being uploaded

The last chunk is when chunks==0 || chunk==chunks-1 chunks==0 || chunk==chunks-1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM