简体   繁体   English

100分钟后,websphere mq客户端主题静默访问错误

[英]websphere mq client topic silent access error after 100 miutes

We're integrating communication using topics in WebSphereMQ. 我们正在使用WebSphereMQ中的主题集成通信。 Using ac# library for MQ, version 8, dll's are from mqc8_8.0.0.3_win64.zip downloaded from official IBM website. 使用适用于MQ的ac#库,版本8,dll是从mqc8_8.0.0.3_win64.zip下载的,该文件是从IBM官方网站下载的。 We connect to the server without a problem, then we access a specified topic, we set the connection as durable, provide the userID. 我们毫无问题地连接到服务器,然后访问指定的主题,将连接设置为持久性,并提供userID。 Then we enter an infinite loop asking every 2 minutes if there are new messages published in this topic. 然后,我们进入一个无限循环,每2分钟询问该主题中是否有新消息发布。 This works great. 这很好。 If the client publishes messages - we get them. 如果客户发布消息-我们会收到消息。 If we get disconnected without removing the subscription, we can resume it after reconnection and messages are there. 如果我们在不删除订阅的情况下断开连接,则可以在重新连接并有消息之后恢复它。 Connection wise it seems ok. 连接明智,似乎还可以。

The problem is that after some semi iddle time (just asking for new message, but every time reciving code 2033 - no new messages) something stops working. 问题在于,经过半个中途时间(只是要求发送新消息,但是每次接收代码2033-没有新消息),某些东西就会停止工作。 There is however no other (such as network) error code. 但是,没有其他(例如网络)错误代码。 We continuously get code 2033, but we can no longer receive messages even if they are put there. 我们不断收到代码2033,但是即使将消息放在那里,我们也无法再接收消息。 If we disconnect (completely close the client application) and reconnect, the messages are there and it works fine for another period of time. 如果我们断开连接(完全关闭客户端应用程序)并重新连接,则消息在那里,并且可以在另一个时间段内正常工作。

Debuggin via network packet sniffer revealed that after almost exactly 100 minutes after connection and accessing the topic, our client stops sending the period "get" messages. 通过网络数据包嗅探器进行的调试显示,在连接并访问该主题后几乎恰好100分钟之后,我们的客户端停止发送“ get”消息。 It does however send hearbeat messages every 5 minutes from this point on - this seems to be clients (libraries) automatic feature. 但是从此刻起,它确实每隔5分钟发送一次心跳消息-这似乎是客户端(库)的自动功能。 However, client side logging reveals that im actually still sending out requests for new messages and each time i keep getting code 2033 as a response, even if messages are actually there. 但是,客户端日志记录显示,即时消息实际上仍在发出对新消息的请求,并且每次我不断收到代码2033作为响应时,即使消息确实存在。 Because of the timely fashion of this occurring every 100 minutes, we think it's some kind of a timeout, but we're unable to determine what timeout. 由于这种情况每100分钟就会及时发生,因此我们认为这是一种超时,但是我们无法确定什么超时。 After some searching i found this in the IBM's documentation: http://www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081860_.htm About disconnect interval being set to 100 minutes, but after contacting the MQ server administrators from the other company, I got assured that in fact they have this value set to 0, so this should not be the case. 经过一番搜索后,我在IBM文档中找到了这一点: http : //www-01.ibm.com/support/knowledgecenter/SSFKSJ_7.5.0/com.ibm.mq.ref.con.doc/q081860_.htm关于断开间隔设置为100分钟,但是在与另一家公司的MQ服务器管理员联系后,我确信他们实际上将此值设置为0,因此情况并非如此。 Also, according to the network sniffing, it looks like the client is stopping the Get messages rather than the server disconnecting us. 另外,根据网络嗅探,客户端似乎在停止获取消息,而不是服务器在断开我们的连接。

There is an even bigger puzzle. 还有一个更大的难题。 We tried soft disconnect of the queuemanager and reconnecting and re-accesing the topics, but it does not help, as if there are some static fields kept even with new instances of queuemanager. 我们尝试了队列管理器的软断开连接以及主题的重新连接和重新访问,但这无济于事,即使对于新的队列管理器实例,似乎也保留了一些静态字段。 We need to completly shutdown the client program to be able to receive messages. 我们需要完全关闭客户端程序才能接收消息。 And all this time, we don't get any other error messages except code 2033 (no new message). 一直以来,除了代码2033(没有新消息)外,我们没有收到其他任何错误消息。

Now for some code. 现在来看一些代码。 This is used every time for connection/reconnection: 每次都用于连接/重新连接:

public void Connect()
{
    MQEnvironment.Hostname = connectionName;//please assume those are correctly filled values
    MQEnvironment.Port = port;
    MQEnvironment.Channel = channelName;
    queueManager = new MQQueueManager(QueueManagerName);
}

Next we access the topic. 接下来,我们访问该主题。

public MQTopic AccessTopic(string topicName)
    {
        MQTopic topic = null;
        topic = queueManager.AccessTopic(topicName, null, MQC.MQSO_CREATE | MQC.MQSO_FAIL_IF_QUIESCING | MQC.MQSO_MANAGED | MQC.MQSO_DURABLE | MQC.MQSO_RESUME, null, "subNameXYZ");
        return topic;
    }

Next, we read the topic. 接下来,我们阅读该主题。 All the functions are working with Try/Catch statemsnts, but i've cleaned them a bit to make it easier to look at. 所有功能都可以使用Try / Catch statemsnts,但是我已经对其进行了一些清理,以使其更易于查看。 This is working in a loop, every 2 minutes. 每2分钟循环一次。

public string ReadTopic(MQTopic topic)
    {
        string strReturn = "";
        if (topic != null)
        {
            try
            {
                queueMessage = new MQMessage();
                queueMessage.Format = MQC.MQFMT_STRING;
                queueGetMessageOptions = new MQGetMessageOptions();
                topic.Get(queueMessage, queueGetMessageOptions);
                strReturn = queueMessage.ReadString(queueMessage.MessageLength);
                queueMessage.ClearMessage();
            }
            catch (MQException exp)
            {
                //checking if code = 2033 "no new message"
            }
        }
        return strReturn;
    }

In addition, every loop, before accessing readtopic, we check if the connection is ok and if not, reconnect,like following: 此外,每个循环在访问readtopic之前,都会检查连接是否正常,如果没有,请重新连接,如下所示:

public void CheckConnection()
{
    if (!queueManager.IsConnected)
    {
        queueManager.Disconnect();
        queueManager.Close();
        Connect();
    }
}

So, in short: The question is what can cause our connection to stop receiving messages from the topic after almost exactly 100 minutes every time, even though there are no error messages, and new messages are published in this topic after those 100 minutes? 因此,简而言之:问题是什么会导致我们的连接每次几乎在100分钟后停止接收来自该主题的消息,即使没有错误消息,并且在这100分钟之后又在该主题中发布了新消息? Side question: Why soft reconnection does not work, and to be able to access the messages we need to shut down the program completly? 附带的问题:为什么软重新连接不起作用,为了能够访问消息,我们需要完全关闭程序?

There are almost no scenarios in MQ that might cause this behavior. MQ中几乎没有任何场景可能会导致此行为。 It is possible for example for a browse cursor to not see newer, higher priority messages that arrive. 例如,浏览光标可能看不到更新的,更高优先级的消息。 Incomplete message groups can also return a 2033 despite non-zero queue depth. 尽管队列深度不为零,但不完整的消息组也可能返回2033。 However your description does not support either of these scenarios as the cause. 但是,您的描述不支持这两种情况中的任何一种。

However this part seems to indicate a bug in MQ classes: 但是,这部分似乎表明MQ类中存在错误:

Debuggin via network packet sniffer revealed that after almost exactly 100 minutes after connection and accessing the topic, our client stops sending the period "get" messages. 通过网络数据包嗅探器进行的调试显示,在连接并访问该主题后几乎恰好100分钟之后,我们的客户端停止发送“ get”消息。 It does however send hearbeat messages every 5 minutes from this point on - this seems to be clients (libraries) automatic feature. 但是从此刻起,它确实每隔5分钟发送一次心跳消息-这似乎是客户端(库)的自动功能。 However, client side logging reveals that im actually still sending out requests for new messages and each time i keep getting code 2033 as a response, even if messages are actually there. 但是,客户端日志记录显示,即时消息实际上仍在发出对新消息的请求,并且每次我不断收到代码2033作为响应时,即使消息确实存在。

The classes cannot reliably return a 2033 unless they first ask the QMgr for messages. 这些类不能可靠地返回2033,除非它们首先向QMgr询问消息。 If your packet captures are complete (ie the network flows of interest did not traverse a thread or socket that wasn't being captured) then the behavior reported by the classes does not match the behavior actually performed. 如果您的数据包捕获已完成(即感兴趣的网络流未遍历未捕获的线程或套接字),则类报告的行为与实际执行的行为不匹配。 If you can reliably reproduce it under trace, IBM should be able to resolve it in a PMR. 如果您可以可靠地重现它,那么IBM应该能够在PMR中对其进行解析。

Until that time, you may be forced to implement a work-around such as periodically restarting the app. 在此之前,您可能被迫实施变通办法,例如定期重新启动应用程序。 You might also try creating a managed subscription to a pre-defined queue and changing the app to poll that. 您也可以尝试创建对预定义队列的托管订阅,然后更改应用程序以对其进行轮询。 If the problem is isolated to the Topic object this would fix it without disturbing any other subscribers on that topic. 如果问题仅与主题对象无关,则可以解决该问题,而不会打扰该主题的任何其他订阅者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM