简体   繁体   English

如何保护ZeroMQ请求回复模式以防止潜在的消息丢失?

[英]How to protect ZeroMQ Request Reply pattern against potential drops of messages?

I'm trying to implement a ZeroMQ pattern on the TCP layer between a c# application and distributed python servers. 我正在尝试在c#应用程序和分布式python服务器之间的TCP层上实现ZeroMQ模式。 I've gotten a version working with the request-reply REQ/REP pattern and it seems relatively stable when testing on localhost . 我有一个使用请求 - 回复REQ/REP模式的版本,在localhost测试时看起来相对稳定。 However, in testing, I've debugged a few situations, where I accidently send multiple requests before receiving a reply which apparently is not acceptable. 但是,在测试中,我调试了一些情况,我在收到回复之前意外地发送了多个请求,这显然是不可接受的。

In practice the network will likely have lots of dropped packets and I suspect that I'll be dropping lots of replies and/or unable to send requests. 在实践中,网络可能会有大量丢弃的数据包,我怀疑我将丢弃大量的回复和/或无法发送请求。

1) Is there a way to reset the connection between REQ/REP request-reply sockets? 1)有没有办法重置REQ/REP请求 - 回复套接字 之间的连接
Would a REOUTER/DEALER pattern instead make more sense? REOUTER/DEALER模式会更有意义吗? As this is my first application with ZeroMQ, I was hoping to keep it simple. 由于这是我第一次使用ZeroMQ,我希望保持简单。

2) Is there a good ZeroMQ mechanism for handling the connectivity events? 2)是否有一个良好的ZeroMQ机制来处理连接事件? I've been reading "the guide" and there are a few mentions of monitoring connections, but no examples. 我一直在阅读“指南”,有一些关于监控连接的提及,但没有例子。 I found the ZMonitor , but can't get the events to trigger in c#. 我找到了ZMonitor ,但无法在c#中触发事件。

Ad 1) No , 广告1)不
there is not any socket link-management interface exposed to user to test/reset the state of the FSA-to-FSA link in ZeroMQ framework. 没有任何套接字链接管理接口向用户公开,以测试/重置ZeroMQ框架中FSA到FSA链路的状态。

Yes, XREQ/XREP may help you overcome the deadlocks, that may & do happen in REQ/REP Scaleable Formal Communication Pattern: 是的, XREQ/XREP可以帮助您克服在REQ/REP可扩展形式通信模式中可能发生和可能发生的死锁:

Ref.: REQ/REP Deadlocks >>> https://stackoverflow.com/a/38163015/3666197 参考: REQ/REP死锁>>> https://stackoverflow.com/a/38163015/3666197

Fig.1: Why it is wrong to use a naive REQ/REP Fig.1:为什么使用天真的REQ/REP是错误的
all cases when [1] in_WaitToRecvSTATE_W2R + [2] in_WaitToRecvSTATE_W2R [1] in_WaitToRecvSTATE_W2R + [2] in_WaitToRecvSTATE_W2R所有情况
are principally unsalvageable mutual deadlock of REQ-FSA/REP-FSA Finite-State-Automata and will never reach the "next" in_WaitToSendSTATE_W2S internal state. 主要是REQ-FSA/REP-FSA有限状态自动机的不可解决的相互死锁,并且永远不会到达“下一个” in_WaitToSendSTATE_W2S内部状态。

               XTRN_RISK_OF_FSA_DEADLOCKED ~ {  NETWORK_LoS
                                         :   || NETWORK_LoM
                                         :   || SIG_KILL( App2 )
                                         :   || ...
                                         :      }
                                         :
[App1]      ![ZeroMQ]                    :    [ZeroMQ]              ![App2] 
code-control! code-control               :    [code-control         ! code-control
+===========!=======================+    :    +=====================!===========+
|           ! ZMQ                   |    :    |              ZMQ    !           |
|           ! REQ-FSA               |    :    |              REP-FSA!           |
|           !+------+BUF> .connect()|    v    |.bind()  +BUF>------+!           |
|           !|W2S   |___|>tcp:>---------[*]-----(tcp:)--|___|W2R   |!           |
|     .send()>-o--->|___|           |         |         |___|-o---->.recv()     |
| ___/      !| ^  | |___|           |         |         |___| ^  | |!      \___ |
| REQ       !| |  v |___|           |         |         |___| |  v |!       REP |
| \___.recv()<----o-|___|           |         |         |___|<---o-<.send()___/ |
|           !|   W2R|___|           |         |         |___|   W2S|!           |
|           !+------<BUF+           |         |         <BUF+------+!           |
|           !                       |         |                     !           |
|           ! ZMQ                   |         |   ZMQ               !           |
|           ! REQ-FSA               |         |   REP-FSA           !           |
~~~~~~~~~~~~~ DEADLOCKED in W2R ~~~~~~~~ * ~~~~~~ DEADLOCKED in W2R ~~~~~~~~~~~~~
|           ! /\/\/\/\/\/\/\/\/\/\/\|         |/\/\/\/\/\/\/\/\/\/\/!           |
|           ! \/\/\/\/\/\/\/\/\/\/\/|         |\/\/\/\/\/\/\/\/\/\/\!           |
+===========!=======================+         +=====================!===========+

Fig.2: One may implement a free-stepping transmission layer using several pure ZeroMQ builtins and add some SIG-layer tools for getting a full control of all possible distributed system states. Fig.2:可以使用几个纯ZeroMQ内置实现自由步进传输层,并添加一些SIG层工具,以完全控制所有可能的分布式系统状态。

App1.PULL.recv( ZMQ.NOBLOCK ) and App1.PULL.poll( 0 ) are obvious App1.PULL.recv( ZMQ.NOBLOCK )App1.PULL.poll( 0 )很明显

[App1]      ![ZeroMQ]
code-control! code-control           
+===========!=======================+
|           !                       |
|           !+----------+           |         
|     .poll()|   W2R ___|.bind()    |         
| ____.recv()<----o-|___|-(tcp:)--------O     
| PULL      !|      |___|           |   :   
|           !|      |___|           |   :   
|           !|      |___|           |   :   
|           !+------<BUF+           |   :     
|           !                       |   :                           ![App2]
|           !                       |   :     [ZeroMQ]              ! code-control
|           !                       |   :     [code-control         ! once gets started ...
|           !                       |   :     +=====================!===========+
|           !                       |   :     |                     !           |
|           !                       |   :     |         +----------+!           |
|           !                       |   :     |         |___       |!           |
|           !                       |   :     |         |___| <--o-<.send()____ |
|           !                       |   :<<-------<tcp:<|___|   W2S|!      PUSH |
|           !                       |   :    .connect() <BUF+------+!           |
|           !                       |   :     |                     !           |
|           !                       |   :     |                     !           |
+===========!=======================+   :     +=====================!===========+

Ad 2) No , 广告2)不
but one may create one's own "ZeroMQ-consumables" to test the distributed system's ability to setup a new transport/signalling socket, being ready to dispose it, if the RTO-test fails to prove that both ( multiple ) sides are ready to setup + communicate over the ZeroMQ infrastructure ( notice, that the problems are not only with the ZeroMQ layer, but also the App-side need not be ready/in such a state to handle the expected communication interactions ( and may cause soft-locks / dead-locks ). 但是,如果RTO测试无法证明两个(多个)侧都已准备就绪,那么可以创建一个自己的“ZeroMQ耗材”来测试分布式系统设置新传输/信号插座的能力, 准备好处理它+通过ZeroMQ基础设施进行通信 (注意,问题不仅在于ZeroMQ层,而且App端也不需要准备/处于这种状态以处理预期的通信交互(并且可能导致软锁/死 - 锁)。


The best next step? 最好的下一步?

What I can do for your further questions right now is to direct you to see a bigger picture on this subject >>> with more arguments , a simple signalling-plane / messaging-plane illustration and a direct link to a must-read book from Pieter HINTJENS. 我现在可以为你的进一步问题做些什么,可以指导你在这个主题上看到更大的图片>>> 更多的论点 ,一个简单的信号平面/消息平面插图和一个必读书籍直接链接 Pieter HINTJENS。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM