Is it possible to stream data into a ZeroMQ message as it is being sent via UDP?

Question

We're working with a latency-critical application at 30fps , with multiple stages in the pipeline (eg compression, network send, image processing, 3D calculations, texture sharing, etc).

Ordinarily we could achieve these multiple stages like so:

[Process 1][Process 2][Process 3]

---------------time------------->

However, if we can stack these processes, then it is possible that as [Process 1] is working on the data, it is continuously passing its result to [Process 2] . This is similar to how iostream works in c++, ie "streaming". With threading, this can result in reduced latency:

[Process 1]
    [Process 2]
        [Process 3]
<------time------->

Let's presume that [Process 2] is our a UDP communication (ie [Process 1] is on Computer A and [Process 3] is on Computer B).

The output of [Process 1] is approximately 3 MB (ie typically > 300 jumbo packets at 9 KB each), therefore we can presume that when we call ZeroMQ 's:

socket->send(message); // message size is 3 MB

Then somewhere in the library or OS, the data is being split into packets which are sent in sequence. This function presumes that the message is already fully formed.

Is there a way (eg API) for parts of the message to be 'under construction' or 'constructed on demand' when sending large data over UDP ? And would this also be possible on the receiving side (ie be allowed to act on the beginning of the message, as the remainder of the message is still incoming). Or.. is the only way to manually split the data into smaller chunks ourselves?

Note:
the network connection is a straight wire GigE connection between Computers A and B.

Answer 1

TLDR; - simple answer is no, ZeroMQ SLOC will not help your project win. The project is doable, but another design view is needed.

Having stated a minimum set of facts:
- 30fps ,
- 3MB/frame ,
- 3-stage processing pipeline,
- private host-host GigE-interconnect,

there is not much to decide without further details.

Sure, there is a threshold of about 33.333 [ms] for the pipeline end-to-end processing ( while you plan to lose some 30 [ms] straight by networkIO ) and the rest you leave to designer's hands. Ouch!

Latency controlled design shall not skip a real-time `I/O` design phase

ZeroMQ is a powerhorse, but that does not mean, it could save a poor design.

If you spend a few moments with timing constraints, the LAN networkIO latency is the worst enemy in your view.

Ref.: Latency numbers everyone should know

1 ) `DESIGN` Processing-phases
2 ) `BENCHMARK` each Phase's implementation model
3 ) `PARALLELISE` wherever Amdahl's Law says it is meaningful and where possible

If your code allows for a parallelised processing, your plans will get much better use of "progressive"-pipeline processing with a use of a ZeroMQ Zero -copy / ( almost ) Zero -latency / Zero -blocking achievable in inproc: transport class and your code may support "progressive"-pipelining as you go among multiple processing phases.

Remember, this is not a one-liner and do not expect a SLOC to control your "progressive"-pipelining fabrication.

[ns] matter , read your numbers from data-processing micro-benchmarks carefully.
They do decide about your success.
Here you may read how much time was "lost" / "vasted" in just changing a color-representations, which your code will need in object detection and 3D scene-processing and texture post-processing. So have your design criteria set for rather high standard levels.

Check the lower left window numbers about milliseconds lost in this real-time pipeline.

If your code's processing requirements do not safely fit into your 33,000,000 [ns] time-budget with { quad | hexa | octa }-core { quad | hexa | octa }-core { quad | hexa | octa }-core CPU resources and if the numerical processing may benefit from many-core GPU resources, there may be the case, that Amdahl's Law may well justify for some asynchronous multi-GPU-kernel processing approach, with their additional +21,000 ~ 23,000 [ns] lost in initial/terminal data transfers +350 ~ 700 [ns] introduced by GPU.gloMEM -> GPU.SM.REG latency masking ( which happily has enough quasi-parallel thread-depth in your case of image processing, even for a low-computational density of the expected trivial GPU-kernels )
Ref.:
GPU/CPU latencies one shall validate initial design against:

Answer 2

No, you can't realistically do it. The API doesn't provide for it, and ZeroMQ promises that a receiver will get a complete message (including multi-part messages) or no message at all, which means that it won't present a message to a receiver until it's fully transferred. Splitting the data yourself into individually actionable chunks that are sent as separate ZeroMQ messages would be the right thing to do, here.

Is it possible to stream data into a ZeroMQ message as it is being sent via UDP?

Question

2 answers

solution1
3 2016-02-11 23:22:21

Latency controlled design shall not skip a real-time `I/O` design phase

1 ) `DESIGN` Processing-phases
2 ) `BENCHMARK` each Phase's implementation model
3 ) `PARALLELISE` wherever Amdahl's Law says it is meaningful and where possible

solution2
2 ACCPTED 2016-02-09 07:20:17

Is it possible to stream data into a ZeroMQ message as it is being sent via UDP?

Question

2 answers

solution1 3 2016-02-11 23:22:21

Latency controlled design shall not skip a real-time I/O design phase

1 ) DESIGN Processing-phases 2 ) BENCHMARK each Phase's implementation model 3 ) PARALLELISE wherever Amdahl's Law says it is meaningful and where possible

solution2 2 ACCPTED 2016-02-09 07:20:17

solution1
3 2016-02-11 23:22:21

Latency controlled design shall not skip a real-time `I/O` design phase

1 ) `DESIGN` Processing-phases
2 ) `BENCHMARK` each Phase's implementation model
3 ) `PARALLELISE` wherever Amdahl's Law says it is meaningful and where possible

solution2
2 ACCPTED 2016-02-09 07:20:17