简体   繁体   English

将大型Koa请求主体上传到AWS S3的正确方法是什么?

[英]What is the correct way upload large koa request body to AWS S3?

I'm building a application backend. 我正在构建一个应用程序后端。 The clients post file as request body to server, then server uploads the file to AWS S3. 客户端将文件作为请求正文发布到服务器,然后服务器将文件上传到AWS S3。 Server is using NodeJS and koa web framework. 服务器正在使用NodeJS和koa Web框架。

If I use the raw-body to get the post body to buffer, when the file is large, the buffer is large, and cause a out of memory error. 如果我使用raw-body来获取要缓存的post正文,则当文件很大时,缓冲区很大,并且会导致内存不足错误。
If I directly pass the ctx.req (a IncomingMessage object) to S3.putObject, AWS SDK throws a error says Cannot determine length of [object Object] , looks like AWS SDK try to get the length of stream then start multi part upload. 如果我直接将ctx.req(一个IncomingMessage对象)传递给S3.putObject,则AWS开发工具包会引发错误,提示Cannot determine length of [object Object] ,就像AWS开发工具包尝试获取流的长度,然后开始分段上传。

AWS SDK version 2.383.0 (currently lastest) AWS开发工具包版本2.383.0(当前最新)
NodeJS 10.14.2 NodeJS 10.14.2

At this time, I wrote a function that read from IncomingMessage as a stream, wait for data event to fill up a large buffer (16MB) then do the multi part upload to S3, this solves the problem well, but I'm still looking for better solution. 这时,我编写了一个函数,作为流从IncomingMessage读取,等待数据事件填满大缓冲区(16MB),然后分段上传到S3,这很好地解决了问题,但是我仍然在寻找以获得更好的解决方案。

After months running, I think my final solution is stable and reliable. 经过几个月的运行,我认为我的最终解决方案是稳定可靠的。

The main concept is receive from IncomingMessage stream store to a buffer, after the buffer reach a size, then put current part to S3, then continue to read the stream until end. 主要概念是从IncomingMessage流接收到缓冲区,缓冲区达到大小后,将当前部分放到S3,然后继续读取流,直到结束。

const uploaderLogger = Log4js.getLogger('customUploader');
function customMultiPartUpload(s3, bucket, key, incomingMessage, partSizeInByte) {
    return new Promise((resolve) => {
        partSizeInByte = partSizeInByte || uploadBufferMB * 1024 * 1024;
        uploaderLogger.debug(`part size is ${partSizeInByte}`);

        let uploadId = null;
        let partNumber = 0;
        let parts = [];
        let fileSize = 0;
        let reserveBuffer = Buffer.alloc(0);
        const sendBuffer = Buffer.alloc(partSizeInByte);
        const md5Hash = Crypto.createHash('md5');

        const doUpload = async (uploadBuffer) => {
            if (!uploadId) {
                uploaderLogger.debug('multipart upload not initialized');
                const createData = await s3.createMultipartUpload({
                    Bucket: bucket,
                    Key: key
                }).promise();
                uploadId = createData.UploadId;
                uploaderLogger.debug(`uploadId ${uploadId}`);

                partNumber = 0;
            }
            fileSize += uploadBuffer.length;
            uploaderLogger.debug(`buffer length ${uploadBuffer.length}, total ${fileSize}`);

            partNumber += 1;
            uploaderLogger.debug(`part number ${partNumber}`);

            md5Hash.update(uploadBuffer);

            const partData = await s3.uploadPart({
                Bucket: bucket,
                Key: key,
                PartNumber: partNumber,
                UploadId: uploadId,
                Body: uploadBuffer
            }).promise();
            parts.push({
                PartNumber: partNumber,
                ETag: partData.ETag
            });
            uploaderLogger.debug(`etag ${partData.ETag}`);
        };

        incomingMessage.on('data', async (chunkBuffer) => {
            incomingMessage.pause();

            reserveBuffer = Buffer.concat([ reserveBuffer, chunkBuffer ]);
            if (reserveBuffer.length > partSizeInByte) {
                do {
                    reserveBuffer.copy(sendBuffer, 0, 0, partSizeInByte);
                    reserveBuffer = reserveBuffer.slice(partSizeInByte);
                    await doUpload(sendBuffer);
                } while (reserveBuffer.length > partSizeInByte);
            }

            incomingMessage.resume();
        });

        incomingMessage.on('end', async () => {
            uploaderLogger.debug('stream end');

            if (reserveBuffer.length > 0) {
                await doUpload(reserveBuffer);
            }

            if (uploadId) {
                uploaderLogger.debug('uploadId not null');
                await s3.completeMultipartUpload({
                    Bucket: bucket,
                    Key: key,
                    UploadId: uploadId,
                    MultipartUpload: {
                        Parts: parts
                    }
                }).promise();
                uploaderLogger.debug('multipart upload complete');
            }

            const hash = md5Hash.digest('hex');

            resolve({
                size: fileSize,
                hash: hash
            });
            uploaderLogger.debug(`return file size ${fileSize}, hash ${hash}`);
        });
    });
}

adjust partSizeInByte to fit your server memory usage, too large part size may cause OOM when server is handling many requests, too small part size may less than S3 part limitation. 调整partSizeInByte以适合您的服务器内存使用情况,太大的零件大小可能会在服务器处理许多请求时导致OOM;太小的零件大小可能会小于S3零件限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM