简体   繁体   English

节点JS流:了解数据串联

[英]Node JS Streams: Understanding data concatenation

One of the first things you learn when you look at node's http module is this pattern for concatenating all of the data events coming from the request read stream: 当您查看节点的http模块时,首先要学习的内容之一就是这种模式,用于连接来自请求读取流的所有数据事件:

let body = [];
request.on('data', chunk => {
  body.push(chunk);
}).on('end', () => {
  body = Buffer.concat(body).toString();
});

However, if you look at a lot of streaming library implementations they seem to gloss over this entirely. 但是,如果您看许多流媒体库实现,它们似乎完全掩盖了这一点。 Also, when I inspect the request.on('data',...) event it almost ever only emits once for a typical JSON payload with a few to a dozen properties. 另外,当我检查request.on('data',...)事件时,对于具有几到十几个属性的典型JSON有效负载,它几乎几乎只发出一次。

You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. 您可以对请求流进行处理,例如通过对象模式中的某些转换将其通过管道传递到其他读取流。 It looks like this concatenating pattern is never needed. 似乎不需要这种级联模式。

Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?. 这是因为处理POST和PUT主体的请求流几乎只发出一个数据事件,这是因为它们的有效载荷远低于块分区大小限制吗? In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk? 实际上,需要在一个以上的数据块中流式传输JSON编码的对象有多大?

It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? 在我看来,objectMode流不必担心级联,因为如果您正在处理一个对象,则它几乎总是不大于一个发射数据的块,而该块会原子地转换为一个对象? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches). 我可以看到一个问题,如果客户端正在上载诸如大型集合之类的东西(在这种情况下,流将非常有用,只要它可以解析集合中的各个对象并逐个或分批地发送它们)。

I find this to probably be the most confusing aspect of really understanding the node.js specifics of streams, there is a weird disconnect between streaming raw data, and dealing with atomic chunks like objects. 我发现这可能是真正了解流的node.js细节最令人困惑的方面,在流传输原始数据与处理原子块(如对象)之间存在怪异的脱节。 Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries? objectMode流转换是否具有用于自动连接到对象边界的内部逻辑? If someone could clarify this it would be very appreciated. 如果有人可以澄清这一点,将不胜感激。

The job of the code you show is to collect all the data from the stream into one buffer so when the end event occurs, you then have all the data. 您显示的代码的工作是将流中的所有数据收集到一个缓冲区中,以便在发生end事件时,您将拥有所有数据。

request.on('data',...) may emit only once or it may emit hundreds of times. request.on('data',...)可能只发射一次,也可能发射数百次。 It depends upon the size of the data, the configuration of the stream object and the type of stream behind it. 它取决于数据的大小,流对象的配置以及其背后的流的类型。 You cannot ever reliably assume it will only emit once. 您永远无法可靠地假设它只会发射一次。

You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. 您可以对请求流进行处理,例如通过对象模式中的某些转换将其通过管道传递到其他读取流。 It looks like this concatenating pattern is never needed. 似乎不需要这种级联模式。

You only use this concatenating pattern when you are trying to get the entire data from this stream into a single variable. 仅在尝试将所有数据从此流获取到单个变量中时,才使用此串联模式。 The whole point of piping to another stream is that you don't need to fetch the entire data from one stream before sending it to the next stream. 传递到另一个流的整个要点是,您无需在将数据发送到下一个流之前从一个流中获取整个数据。 .pipe() will just send data as it arrives to the next stream for you. .pipe()只会在数据到达下一个流时为您发送数据。 Same for transforms. 转换也一样。

Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?. 这是因为处理POST和PUT主体的请求流几乎只发出一个数据事件,这是因为它们的有效载荷远低于块分区大小限制吗?

It is likely because the payload is below some internal buffer size and the transport is sending all the data at once and you aren't running on a slow link and .... The point here is you cannot make assumptions about how many data events there will be. 可能是因为有效负载低于某个内部缓冲区大小,并且传输一次发送了所有数据,并且您没有在慢速链接上运行,并且....此处的要点是您无法假设有多少个数据事件将有。 You must assume there can be more than one and that the first data event does not necessarily contain all the data or data separated on a nice boundary. 您必须假设可以有多个以上对象,并且第一个数据事件不一定包含所有数据或在合理边界上分离的数据。 Lots of things can cause the incoming data to get broken up differently. 很多事情都会导致传入的数据以不同的方式分解。

Keep in mind that a readStream reads data until there's momentarily no more data to read (up to the size of the internal buffer) and then it emits a data event. 请记住,readStream会读取数据,直到暂时没有更多数据要读取(取决于内部缓冲区的大小),然后它才会发出data事件。 It doesn't wait until the buffer fills before emitting a data event. 它不会等到缓冲区填满后才发出data事件。 So, since all data at the lower levels of the TCP stack is sent in packets, all it takes is a momentary delivery delay with some packet and the stream will find no more data available to read and will emit a data event. 因此,由于TCP堆栈较低级别的所有数据都以数据包的形式发送,因此仅需短暂的延迟就可以传输某些数据包,并且该流将找不到更多可读取的data并将发出data事件。 This can happen because of the way the data is sent, because of things that happen in the transport over which the data flows or even because of local TCP flow control if lots of stuff is going on with the TCP stack at the OS level. 发生这种情况的原因可能是数据的发送方式,数据流传输中发生的事情,或者是由于OS级别的TCP堆栈上发生了很多事情,甚至是本地TCP流控制。

In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk? 实际上,需要在一个以上的数据块中流式传输JSON编码的对象有多大?

You really should not know or care because you HAVE to assume that any size object could be delivered in more than one data event. 您真的不应该知道或关心,因为您必须假设任何大小的对象都可以在多个data事件中传递。 You can probably safely assume that a JSON object larger than the internal stream buffer size (which you could find out by studying the stream code or examining internals in the debugger) WILL be delivered in multiple data events, but you cannot assume the reverse because there are other variables such as transport-related things that can cause it to get split up into multiple events. 您可能可以放心地假设,将在多个数据事件中提供大于内部流缓冲区大小(可以通过研究流代码或检查调试器中的内部数据来发现)的JSON对象,但是您不能假定相反的情况,因为还有其他变量,例如与运输相关的事物,可能导致它分解为多个事件。

It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? 在我看来,objectMode流不必担心级联,因为如果您正在处理一个对象,则它几乎总是不大于一个发射数据的块,而该块会原子地转换为一个对象? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches). 我可以看到一个问题,如果客户端正在上载诸如大型集合之类的东西(在这种情况下,流将非常有用,只要它可以解析集合中的各个对象并逐个或分批地发送它们)。

Object mode streams must do their own internal buffering to find the boundaries of whatever objects they are parsing so that they can emit only whole objects. 对象模式流必须执行自己的内部缓冲以查找正在解析的任何对象的边界,以便它们只能发出整个对象。 At some low level, they are concatenating data buffers and then examining them to see if they yet have a whole object. 在某种程度上,它们是连接数据缓冲区,然后检查它们是否还具有整个对象。

Yes, you are correct that if you were using an object mode stream and the object themselves were very large, they could consume a lot of memory. 是的,如果您使用的是对象模式流并且对象本身非常大,那么它们会消耗大量内存是正确的。 Likely this wouldn't be the most optimal way of dealing with that type of data. 可能这不是处理此类数据的最佳方法。

Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries? objectMode流转换是否具有用于自动连接到对象边界的内部逻辑?

Yes, they do. 是的,他们有。


FYI, the first thing I do when making http requests is to go use the request-promise library so I don't have to do my own concatenating. 仅供参考,发出http请求时,我要做的第一件事就是使用request-promise库,这样我就不必自己进行串联了。 It handles all this for you. 它为您处理所有这一切。 It also provides a promise-based interface and about 100 other helpful features which I find helpful. 它还提供了一个基于Promise的界面以及大约100个其他有用的功能,这些功能对我都很有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM