简体   繁体   English

使用node.js流式处理传入请求

[英]Streaming incoming requests with node.js

I am writing an API using node.js with express. 我正在使用带有express的node.js编写API。 Part of the API will allow users to POST large payloads of binary data (perhaps hundreds of MB) to be stored in the server database. 该API的一部分将允许用户发布大型二进制数据有效载荷(也许数百MB),以存储在服务器数据库中。

As it stands now, the express request handler does not get called until the entire upload is ready and stored in memory on the server (req.body). 到目前为止,直到整个上传准备就绪并存储在服务器(请求主体)的内存中之后,快递请求处理程序才会被调用。 Then it has to be saved to a database. 然后必须将其保存到数据库。 There are two things I don't like about this. 我有两件事对此不满意。 The first is that it requires a lot of server memory to hold all that binary data at once. 首先是它需要大量服务器内存才能一次容纳所有二进制数据。 The second is that many databases like MongoDB and S3 allow for streaming so you don't really need to have all the data in place before you start writing it, so there's no reason to wait around. 第二个原因是,许多数据库(例如MongoDB和S3)都支持流传输,因此在开始编写数据之前您实际上并不需要将所有数据都放在适当的位置,因此无需等待。

So my question is, can node (through express or some other way) be configured to start streaming to the database before the entire request has come in? 所以我的问题是,是否可以将节点(通过express或其他方式)配置为在整个请求进入之前就开始流传输到数据库?

After further research, I have found that the native "http" module does in fact support streaming in the way I mentioned. 经过进一步研究,我发现本地的“ http”模块实际上以我提到的方式支持流。 I'm not sure if express supports this. 我不确定Express是否支持这一点。 I would guess that it does, but in the case of an upload you probably cannot use the bodyParser middleware since that probably blocks until the entire request body is received. 我猜想确实如此,但是在上载的情况下,您可能无法使用bodyParser中间件,因为在收到整个请求正文之前,该中间件可能会阻塞。

Anyway, here is some code that shows how you can stream an incoming request to MongoDB's GridFS: 无论如何,这里有一些代码显示了如何将传入的请求流式传输到MongoDB的GridFS:

var http = require('http');
var mongo = require('mongodb');

var db = new mongo.Db('somedb', new mongo.Server("localhost", 27017), { safe: true });

db.open(function(err) {
    if (err)
        console.log(err);

    http.createServer(function(req, res) {
        var numToSave = 0;
        var endCalled = false;

        new mongo.GridStore(db, new mongo.ObjectID(), "w", { root: "fs", filename: "test" }).open(function(err, gridStore) {
            if(err)
               console.log(err);

            gridStore.chunkSize = 1024 * 256;

            req.on("data", function(chunk) {
                numToSave++;

                gridStore.write(chunk, function(err, gridStore) {
                   if(err)
                      console.log(err);

                   numToSave--;

                   if(numToSave === 0 && endCalled)
                      finishUp(gridStore, res);
                });
            });

            req.on("end", function() {
                endCalled = true;
                console.log("end called");

                if(numToSave === 0)
                    finishUp(gridStore, res);
            });
        });
    }).listen(8000);
});

function finishUp(gridStore, res) {
    gridStore.close();
    res.end();
    console.log("finishing up");
}

The gist is that the req object is actually a stream with "data" and "end" events. 要点在于,req对象实际上是具有“数据”和“结束”事件的流。 Every time a "data" event occurs, you write a chunk of data to mongo. 每次发生“数据”事件时,您都会将一块数据写入mongo。 When the "end" event occurs, you close the mongo connection and send out the response. 发生“结束”事件时,请关闭mongo连接并发送响应。

There is some yuckiness related to coordinating all the different async activities. 协调所有不同的异步活动都有些麻烦。 You don't want to close the mongo connection before you have had a chance to actually write out all the data. 您不想在有机会实际写出所有数据之前就关闭mongo连接。 I achieve this with a counter and a boolean but there might be a better way using some library. 我用一个计数器和一个布尔值来实现这一点,但是使用某些库可能会有更好的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM