简体繁体 English

正确实现RESTful大文件上传的方法

[英]Proper way to implement RESTful large file upload

原文 2015-11-24 09:06:33 7 2 java/ file/ rest/ curl/ file-upload

I've been making REST APIs for some time now, and I'm still bugged with one case - large file upload. 我一直在制作REST API已经有一段时间了，我仍然会遇到一个案例 - 大文件上传。 I've read a couple of other APIs, like Google Drive, Twitter and other literature, and I got two ideas, but I'm not sure is any of them "proper". 我已经阅读了其他几个API，比如Google Drive，Twitter和其他文献，我有两个想法，但我不确定它们中的任何一个是“正确的”。 As in proper, I mean it is somewhat standardized, there is not too much client logic needed (since other parties will be implementing that client), or even better, it could be easily called with cURL. 正如在适当的情况下，我的意思是它有点标准化，不需要太多的客户端逻辑（因为其他方将实现该客户端），或者甚至更好，它可以通过cURL轻松调用。 The plan is to implement it in Java, preferably Play Framework. 计划是用Java实现它，最好是Play Framework。

Obviously I'll need some file partitioning and server-side buffering mechanism since the files are large. 显然，由于文件很大，我需要一些文件分区和服务器端缓冲机制。

So, the first solution I've got is a multipart upload ( multipart/form-data ). 所以，我得到的第一个解决方案是分段上传（ multipart/form-data ）。 I get this way and I have implemented it like this before, but it is always strange to me to actually emulate a form on the client side, especially since the client has to set the file key name, and in my experience, that is something that clients kinda forget or do not understand. 我得到了这种方式，之前我已经实现了这个方法，但实际上在客户端模拟表单总是很奇怪，特别是因为客户端必须设置文件密钥名称，根据我的经验，这是一些东西客户有点忘记或不理解。 Also, how is the chunk size/part size dictated? 另外，块尺寸/零件尺寸是如何规定的？ What keeps the client from putting the whole file in one chunk? 什么阻止客户端将整个文件放在一个块中？

Solution two, at least what I understood, but without finding an actual implementation implementation is that a "regular" POST request can work. 解决方案二，至少我理解的，但没有找到实际的实现实现是“常规”POST请求可以工作。 The content should be chunked and data is buffered on the on the server side. 内容应该分块，数据在服务器端缓冲。 However, I am not sure this is a proper understanding. 但是，我不确定这是一个正确的理解。 How is data actually chunked, does the upload span multiple HTTP requests or is it chunked on the TCP level? 数据如何实际分块，上传是跨越多个HTTP请求还是在TCP级别上进行分块？ What is the Content-Type ? 什么是Content-Type ？

Bottom line, what of these two (or anything else?) should be a client-friendly, widely understandable, way of implementing a REST API for file upload? 最重要的是，这两个（或其他什么？）应该是一个客户友好的，广泛可理解的实现REST API文件上传的方式？

2 个解决方案

I would recommend taking a look at the Amazon S3 Rest API's solution to multipart file upload. 我建议看看Amazon S3 Rest API的多部分文件上传解决方案。 The documentation can be found here . 文档可以在这里找到。

To summarize the procedure Amazon uses: 总结亚马逊使用的程序：

The client sends a request to initiate a multipart upload, the API responds with an upload id 客户端发送启动分段上传的请求，API以上传ID进行响应
The client uploads each file chunk with a part number (to maintain ordering of the file), the size of the part, the md5 hash of the part and the upload id; 客户端使用部件号上传每个文件块（以维护文件的顺序），部件的大小，部件的md5哈希值和上传ID; each of these requests is a separate HTTP request. 这些请求中的每一个都是单独的HTTP请求。 The API validates the chunk by checking the md5 hash received chunk against the md5 hash the client supplied and the size of the chunk matches the size the client supplied. API通过检查md5散列接收的块与客户端提供的md5散列以及块的大小是否与客户端提供的大小相匹配来验证块。 The API responds with a tag (unique id) for the chunk. API使用块的标记（唯一ID）进行响应。 If you deploy your API across multiple locations you will need to consider how to store the chunks and later access them in a way that is location transparent. 如果您跨多个位置部署API，则需要考虑如何存储块，然后以位置透明的方式访问它们。
The client issues a request to complete the upload which contains a list of each chunk number and the associated chunk tag (unique id) received from API. 客户端发出完成上载的请求，其中包含每个块号的列表以及从API接收的关联块标记（唯一ID）。 The API validates there are no missing chunks and that the chunk numbers match the correct chunk tag and then assembles the file or returns an error response. API验证没有丢失的块，并且块号与正确的块标记匹配，然后汇编文件或返回错误响应。

Amazon also supplies methods to abort the upload and list the chunks associated with the upload. 亚马逊还提供了中止上传的方法，并列出了与上传相关联的块。 You may also want to consider a timeout for the upload request in which the chunks are destroyed if the upload is not completed within a certain amount of time. 如果上载未在一定时间内完成，您可能还需要考虑上传请求的超时，在该超时请求中销毁块。

In terms of controlling the chunk sizes that the client uploads, you won't have much control over how the client decides to split up the upload. 在控制客户端上传的块大小方面，您无法控制客户端如何决定拆分上载。 You could consider having a maximum chunk size configured for the upload and supply error responses for requests that contain chunks larger than the max size. 您可以考虑为上载配置最大块大小，并为包含大于最大大小的块的请求提供错误响应。

I've found the procedure works very well for handling large file uploads in REST APIs and facilitates the handling of the many edge cases associated with file upload. 我发现这个过程非常适合处理REST API中的大文件上传，并且有助于处理与文件上载相关的许多边缘情况。 Unfortunately, I've yet to find a library that makes this easy to implement in any language so you pretty much have to write all of the logic yourself. 不幸的是，我还没有找到一个能够以任何语言轻松实现的库，因此您必须自己编写所有逻辑。

https://tus.io/ is resumable protocol which helps in chunk uploading and resuming the upload after timeout. https://tus.io/是可恢复协议，有助于块上传并在超时后恢复上传。 This is a opensource implementation and has various client and server implementations already in different languages. 这是一个开源实现，并且已经有不同语言的各种客户端和服务器实现。