简体   繁体   English

从Web浏览器上传大文件并传输到Amazon S3

[英]Uploading large files from web browser and transferring to Amazon S3

We currently have a small web app, part of which is file uploads. 我们目前有一个小型的Web应用程序,其中一部分是文件上传。 Currently we are using Plupload on the client with chunking enabled to allow large files be uploaded. 当前,我们在客户端上使用Plupload并启用了分块以允许上传大文件。 The files are saved on the app server and the chunks are appended as they come up. 这些文件将保存在应用服务器上,并在出现块时将其追加。

Now we are moving to Amazon S3 for file storage with the possiblity of multiple app servers. 现在,我们正在迁移到具有多个应用程序服务器的Amazon S3进行文件存储。 I'm finding it difficult how to handle these chunks. 我发现很难处理这些块。 I was trying to follow their example , but I'm running into problems. 我试图效仿他们的榜样 ,但遇到了麻烦。 The meat of what I'm trying looks like this: 我尝试的内容如下所示:

UploadPartRequest uploadRequest = new UploadPartRequest()
    .withBucketName(bucket).withKey(key)
    .withUploadId(uploadId).withPartNumber(partNumber)
    .withPartSize(bytes.length)
    .withInputStream(new ByteArrayInputStream(bytes));

s3Client.uploadPart(uploadRequest);

The problem I'm having is that I need to somehow know the uploadId of the chunk. 我遇到的问题是我需要以某种方式知道该块的uploadId。 I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up? 从上载的初始化获取InitiateMultipartUploadResult时就可以使用,但是如何将其与以后出现的块相关联? I thought I could perhaps send it down with the first response, and then send it back up with each chunk request. 我以为我可以在第一个响应中将其发送下来,然后在每个块请求中将其发送回去。 That didn't seem like too far of reach. 这似乎似乎遥不可及。

Then I found that in order to complete the upload I need a List<PartETag> with the PartETag s getting returned from each upload to Amazon S3. 然后,我发现为了完成上传,我需要一个List<PartETag> ,其中PartETag从每次上传到Amazon S3的返回都将返回。 So, my next question was how do I save all of these PartETag s while the chunks are being uploaded from the browser? 因此,我的下一个问题是,当从浏览器上传块时,如何保存所有这些PartETag My first thought was I could send down the PartETag of each chunk in the response, and then store those client side. 我的第一个想法是我可以发送响应中每个块的PartETag ,然后存储这些客户端。 I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETag s. 我不确定是否有办法知道何时最后一个块被上传,以便我可以发送所有这些PartETag IF there's not, I'd just have to send up all the ones I have each time, and then only the last request would use them. 如果没有,我只需要每次发送所有我所拥有的,然后只有最后一个请求才使用它们。 This all seems to be a little hacky to me. 对我来说,这一切似乎有些拙劣。

So, I'm thinking someone has to have dealt with this before. 因此,我认为以前必须有人对此进行处理。 Is there a good, standard way of doing this? 有没有一种好的,标准的方式来做到这一点?

I thought about constructing the file on the app server and then sending it over to S3, but with multiple app servers, the chunks aren't guaranteed to end up in the same place. 我曾考虑过在应用服务器上构建文件,然后将其发送到S3,但是对于多个应用服务器,不能保证这些块最终都位于同一位置。

Another thought I've had is to store all this information in the database during the upload, but I wasn't sure I wanted to have to go hit the database with each chunk request. 我不得不想到的另一种想法是,在上载期间将所有这些信息存储在数据库中,但是我不确定我是否想在每次块请求时都要访问数据库。 Are there any other options besides this? 除此之外,还有其他选择吗?

I appreciate any help anyone can provide. 感谢任何人可以提供的任何帮助。

Try our IaaS solution: 试试我们的IaaS解决方案:

https://uploadcare.com https://uploadcare.com

It supports file size up to 5GB . 它支持最大5GB的文件。 Here is an article about a successful use case for uploading large files using our system: 以下是有关使用我们的系统上传大文件的成功用例的文章:

https://community.skuidify.com/skuid/topics/how_to_upload_large_files_using_uploadcare_com https://community.skuidify.com/skuid/topics/how_to_upload_large_files_using_uploadcare_com

Correct me if I'm wrong, but as I understand your question your web servers act as proxies between the browser and the client. 如果我错了,请纠正我,但是据我所知,您的问题是您的Web服务器充当浏览器和客户端之间的代理。

The problem I'm having is that I need to somehow know the uploadId of the chunk. 我遇到的问题是我需要以某种方式知道该块的uploadId。 I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up? 从上载的初始化获取InitiateMultipartUploadResult时就可以使用,但是如何将其与以后出现的块相关联?

On BeforeUpload you may add the uploadId as querystring parameter, as in this answer BeforeUpload您可以将uploadId添加为querystring参数,如以下答案所示

My first thought was I could send down the PartETag of each chunk in the response, and then store those client side. 我的第一个想法是我可以发送响应中每个块的PartETag,然后存储这些客户端。

This seems a good idea, then altering the querystring as above on 'ChunkUploaded' to add the just received PartETag , thus transfering all previously received PartETag with each request. 这似乎是一个好主意,然后像上面在“ ChunkUploaded”上那样更改查询字符串以添加刚刚接收到的PartETag ,从而随每个请求转移所有先前接收到的PartETag Not sure altering the querystring between chunks is possible, or if you can synchronously do some processing before upload of next chunk starts, but it is worth a try I would say. 不确定是否可以在块之间更改查询字符串,或者不确定是否可以在下一个块开始上载之前同步执行一些处理,但是我想尝试一下是值得的。

I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETags. 我不确定是否有一种方法可以知道何时上传了最后一个块,以便可以发送所有这些PartETag。

This can be found in the php samples in the plupload download : two POST parameters are sent by plupload to the server 这可以在plupload下载的php示例中找到:plupload将两个POST参数发送到服务器

  • chunks : total number of chunks of the upload (0 if upload not chunked) chunks :上传的大块总数(如果上传没有大块,则为0)
  • chunk : index of the current chunk being uploaded chunk :当前正在上传的块的索引

The last chunk is when chunks==0 || chunk==chunks-1 最后一块是chunks==0 || chunk==chunks-1 chunks==0 || chunk==chunks-1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM