简体   繁体   English

Hazelcast-客户端模式拓扑/分布式地图锁定问题

[英]Hazelcast - client mode topology / distributed map lock issue

Below is the description of problem we faced in production. 以下是我们在生产中遇到的问题的描述。 Please note that I could not reproduce the issue in test or local environment and therfore can not provide you with test code. 请注意,我无法在测试或本地环境中重现该问题,因此无法为您提供测试代码。

We have a hazelcast cluster with two members M1, M2 and three clients C1,C2,C3. 我们有一个带有两个成员M1,M2和三个客户端C1,C2,C3的hazelcast群集。 Hazelcast version is 3.9. Hazelcast版本是3.9。

Clients use IMap.tryLock() method with timeout of 10 seconds. 客户端使用IMap.tryLock()方法,超时时间为10秒。 After getting the lock, critical and long running operations are performed and finally the lock is released using IMap.unlock() method. 获取锁后,将执行关键操作和长时间运行的操作,最后使用IMap.unlock()方法释放该锁。

The problem occured in production is as follows: 生产中出现的问题如下:

At some time instant t, we first saw heartbeat failure to M2 at client C2. 在某个时刻t,我们首先在客户端C2看到M2的心跳失败。 Afterwards there are errors in fetching partition table casued by com.hazelcast.spi.exception.TargetDisconnectedException: 之后,由于com.hazelcast.spi.exception.TargetDisconnectedException导致在获取分区表时出现错误:

[hz.client_0.internal-2                       ] WARN  [] HeartbeatManager               - hz.client_0 [mygroup] [3.9] HeartbeatManager failed to connection: .....

[hz.client_0.internal-3                       ] WARN  [] ClientPartitionService         - hz.client_0 [mygroup] [3.9] Error while fetching cluster partition table!
java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, ......

Around 250 ms after initial heartbeat failure, client gets disconnected and then reconnects in 20 ms. 初始心跳失败后约250毫秒,客户端将断开连接,然后在20毫秒内重新连接。

[hz.client_0.cluster-                         ] INFO  [] LifecycleService               - hz.client_0 [mygroup] [3.9] HazelcastClient 3.9 (20171023 - b29f549) is CLIENT_DISCONNETED

[hz.client_0.cluster-                         ] INFO  [] LifecycleService               - hz.client_0 [mygroup] [3.9] HazelcastClient 3.9 (20171023 - b29f549) is CLIENT_CONNECTED

The problem we are having is, for some keys that are previously acquired by C2, C1 and C3 can not acquire the lock even if it seems to be released by C2. 我们遇到的问题是,对于某些以前由C2获取的密钥,即使C1和C3似乎已释放它,C1和C3也无法获取该锁。 C2 can get the lock, but this puts unacceptable delays to the application and is not acceptable.. All clients should get since lock is released... C2可以获取锁,但是这给应用程序带来了无法接受的延迟,因此是不可接受的。自释放锁以来,所有客户端都应该获取...

We were notified of the problem after receiving complaints, and then restarted the client application C2. 收到投诉后,我们已收到有关该问题的通知,然后重新启动了客户端应用程序C2。

As documented in http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Lock.html , locks acquired by restarted member (C2 in my case) seemed to be removed after restart operation. http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Lock.html中所述 ,重新启动成员(在我的情况下为C2)获取的锁似乎在重新启动操作后被删除。

Currently the issue seems to go away, but we are not sure if it will recur. 目前,该问题似乎已消失,但我们不确定是否会再次发生。

Do you have any suggestions about the probable cause and more importantly do you have any recommendations? 您对可能的原因有任何建议,更重要的是,您有任何建议吗?

Would enabling redo-operation in client help for this problem case? 在客户端中启用重做操作是否可以解决此问题?

As I tried to explain client seems to recover the problem, but keys remain locked in cluster and this is fatal to my application. 正如我试图解释的那样,客户端似乎可以解决问题,但是密钥仍然锁定在群集中,这对我的应用程序是致命的。

Thanks 谢谢

It looks like the client had lost the ownership of the lock because of its disconnection from the cluster. 客户端似乎由于与群集断开连接而失去了锁的所有权。 You can use IMap#forceUnlock API in cases such as you faced. 在遇到此类情况时,可以使用IMap#forceUnlock API。 It releases the lock regardless of the lock owner and it always successfully unlocks, never blocks, and returns immediately. 无论锁定所有者是什么,它都会释放锁定,并且总是成功解锁,永不阻塞并立即返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM