简体   繁体   English

Node js Stream文件而不保存到内存

[英]Node js Stream file without saving to memory

I am building an API that needs to accept file uploads.我正在构建一个需要接受文件上传的 API。 So a user can POST a file to an endpoint, the file will be sent to a virus scan, then if it's clean will be sent to storage (probably S3).因此,用户可以将文件POST到端点,该文件将被发送到病毒扫描,然后如果它是干净的将被发送到存储(可能是 S3)。 So far I have achieved this with one issue: The files are temporarily saved in the applications file system.到目前为止,我已经通过一个问题实现了这一点:文件临时保存在应用程序文件系统中。 I need to design an app that doesn't store things in memory.我需要设计一个不在内存中存储东西的应用程序。 Here is my currently working code:这是我目前的工作代码:

app.js应用程序.js

const express = require('express');
const bb = require('express-busboy');

const app = express();

// Busboy modules extends the express app to handle incoming files
bb.extend(app, {
    upload: true,
    path: './tmp'
});

Routes.js路由.js

const express = require('express');
const router = express.Router();
const fileManagementService = require('./file-management-service')();

router
.route('/:fileId')
.post(async (req, res, next) => {
    try {
        const {fileId} = req.params;
        const {files} = req;
        const response = await fileManagementService.postFile(files, fileId);

        res.status(201).json(response);
    } catch (err) {
        next(err);
    }
})

file-management-service.js文件管理服务.js

const fs = require('fs');

function createUploader() {
    // POST /:fileId
    async function postFile(data, fileId) {
        const {file} = data.file;
        const fileStream = fs.createReadStream(file);
        const scanOutput = await scanFile(fileStream); // Function scans file for viruses
        const status = scanOutput.status === 'OK';
        let upload = 'NOT UPLOADED';
        if (status) {
            upload = await postS3Object({file}); // Some function that sends the file to S3 or other storage
        }
        fs.unlinkSync(file);
        return {
            fileId,
            scanned: scanOutput,
            upload 
        };
    }

    return Object.freeze({
        postFile
    });
}

module.exports = createUploader;

As mentioned, the above works as expected, the file is sent to be scanned, then sent to an S3 bucket before returning a response to the poster to that effect.如上所述,上述工作按预期进行,文件被发送以进行扫描,然后发送到 S3 存储桶,然后将响应返回给发布者以达到该效果。 However my implementation of express-busboy is storing the file in the ./tmp folder, then I'm converting this into a readable stream using fs.createReadStream(filePath);但是,我对 express- ./tmp实现将文件存储在./tmp文件夹中,然后我使用fs.createReadStream(filePath);将其转换为可读流fs.createReadStream(filePath); before sending it to the AV and again in the function that sends the file to S3.在将其发送到 AV 之前,再次在将文件发送到 S3 的函数中。

This API is being hosted in a kubernetes cluster and I need to avoid creating states.此 API 托管在 kubernetes 集群中,我需要避免创建状态。 How can I achieve the above without actually saving the file?如何在不实际保存文件的情况下实现上述目标? I'm guessing busboy receives this file as some sort of stream, so without sounding dense, can it not just remain a stream and be piped through these functions to achieve the same outcome?我猜 busboy 将这个文件作为某种流接收,所以听起来不那么密集,难道它不能只是保持一个流并通过这些函数进行管道传输以达到相同的结果吗?

You can use busboy at a bit lower level and get access to it's translated readstream.您可以在较低级别使用 busboy 并访问它的翻译读取流。 Here's an example from the busboy doc that can be adapted for your situation:以下是busboy 文档中的一个示例,可以根据您的情况进行调整:

http.createServer(function(req, res) {
  if (req.method === 'POST') {
    var busboy = new Busboy({ headers: req.headers });
    busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
      var saveTo = path.join(os.tmpDir(), path.basename(fieldname));
      file.pipe(fs.createWriteStream(saveTo));
    });
    busboy.on('finish', function() {
      res.writeHead(200, { 'Connection': 'close' });
      res.end("That's all folks!");
    });
    return req.pipe(busboy);
  }
  res.writeHead(404);
  res.end();
}).listen(8000, function() {
  console.log('Listening for requests');
});

The key part is this which I've annotated:关键部分是我注释的:

    // create a new busboy instance on each incoming request that has files with it
    var busboy = new Busboy({ headers: req.headers });

    // register for the file event
    busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
      // at this point the file argument is a readstream for the data of an uploaded file
      // you can do whatever you want with this readstream such as
      // feed it directly to your anti-virus 

      // this example code saves it to a tempfile
      // you would replace this with code that sends the stream to your anti-virus
      var saveTo = path.join(os.tmpDir(), path.basename(fieldname));
      file.pipe(fs.createWriteStream(saveTo));
    });

    // this recognizes the end of the upload stream and sends 
    // whatever you want the final http response to be
    busboy.on('finish', function() {
      res.writeHead(200, { 'Connection': 'close' });
      res.end("That's all folks!");
    });

    // this gets busboy started, feeding the incoming request to busboy
    // so it can start reading it and parsing it and will eventually trigger
    // one or more "file" events
    return req.pipe(busboy);

When you've identified an incoming request that you want to do this custom busboy operation in, you create an instance of Busboy, pass it the headers and register for the file event.当您确定要在其中执行此自定义 busboy 操作的传入请求时,您创建一个 Busboy 实例,将标头传递给它并注册file事件。 That file event gives you a new file readstream that is the converted file as a readstream.该文件事件为您提供了一个新的file读取流,它是作为读取流的转换后的文件。 You could then pipe that stream directly to your anti-virus without ever going through the file system.然后,您可以将该流直接通过管道传输到您的防病毒软件,而无需通过文件系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM