简体   繁体   English

Apache Ignite 连续查询缓存事务

[英]Apache Ignite Continuous query Cache Transactions

We are using a continuous query to transfer data across all clients nodes.我们正在使用连续查询在所有客户端节点之间传输数据。 However we have a scaling grid so we often run into issue where data node keeps trying to connect to client to send the data from continuous query which has already scaled down.然而,我们有一个缩放网格,所以我们经常遇到数据节点不断尝试连接到客户端以发送来自已经缩小的连续查询的数据的问题。 This brings system to a halt as PME operation cannot acquire a lock, so topology doesn't get updated.这会使系统停止,因为 PME 操作无法获取锁,因此拓扑不会更新。

In order to resolve this, I want to use parameter TxTimeoutOnPartitionMapExchange which will allow PME to proceed.为了解决这个问题,我想使用参数TxTimeoutOnPartitionMapExchange ,这将允许 PME 继续。 However in order to utilise this parameter, do i need to change atomicityMode of my caches to transactional?但是,为了利用此参数,我是否需要将缓存的atomicityMode更改为事务性? If yes then, will the process of data node trying to send data from continuous query count as a transaction?如果是,那么数据节点尝试从连续查询中发送数据的过程是否算作事务?

In summary, I am trying to work out if TxTimeoutOnPartitionMapExchange parameter help in my situation with continuous query and what will be the steps to enable this parameter.总之,我正在尝试确定TxTimeoutOnPartitionMapExchange参数是否对我的连续查询情况有帮助,以及启用此参数的步骤是什么。

EDIT: Stacktrace of issue I am trying to solve:编辑:我试图解决的问题的堆栈跟踪:

  • Continuous keeps trying to reserve the client and i believe it holds global lock here which blocks cache updates and checkpointing Continuous 不断尝试保留客户端,我相信它在这里持有全局锁,阻止缓存更新和检查点

: :

Deadlock: false
    Completed: 1999706
Thread [name="sys-stripe-6-#7%pv-ib-valuation%", id=42, state=WAITING, blockCnt=52537, waitCnt=734400]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
        at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3229)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960)
        at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:2100)
        at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:2365)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1964)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1935)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1917)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:1324)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1261)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:1059)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:90)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:459)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447)
        at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2495)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2657)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2118)
  • This starts coming up after reserveclient call is struck as it is unable to acquire a lock这在 reserveclient 调用被触发后开始出现,因为它无法获取锁

: :

>>> Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5%pv-ib-valuation%
    Queue: []
    Deadlock: false
    Completed: 6328076
Thread [name="sys-stripe-4-#5%pv-ib-valuation%", id=40, state=WAITING, blockCnt=111790, waitCnt=2018248]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@66d8e343, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1663)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2715)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2679)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1051)
        at o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:243)
        at o.a.i.i.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:873)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1189)

So overall my analysis so far is that if a client is gone then continuous query keeps trying to connect holding a lock which blocks everything.因此,到目前为止,我的总体分析是,如果客户端不在,则连续查询会继续尝试连接并持有阻止所有内容的锁。

  • Sample page locks dump.样本页锁转储。 Its a similar page link dump everytime and all threads just seem to be waiting and not locked它每次都有一个类似的页面链接转储,所有线程似乎都在等待而不是锁定

: :

Page locks dump:

Thread=[name=checkpoint-runner-#94%pv-ib-valuation%, id=162], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#94%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#95%pv-ib-valuation%, id=163], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#95%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#96%pv-ib-valuation%, id=164], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#96%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#97%pv-ib-valuation%, id=165], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#97%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-0-#15%pv-ib-valuation%, id=50], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-0-#15%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-1-#16%pv-ib-valuation%, id=51], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-1-#16%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-10-#25%pv-ib-valuation%, id=60], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-10-#25%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-11-#26%pv-ib-valuation%, id=61], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-11-#26%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-12-#27%pv-ib-valuation%, id=62], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-12-#27%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-13-#28%pv-ib-valuation%, id=63], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-13-#28%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-14-#29%pv-ib-valuation%, id=64], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-14-#29%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-15-#30%pv-ib-valuation%, id=65], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-15-#30%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-2-#17%pv-ib-valuation%, id=52], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-2-#17%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-3-#18%pv-ib-valuation%, id=53], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-3-#18%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-4-#19%pv-ib-valuation%, id=54], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-4-#19%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-5-#20%pv-ib-valuation%, id=55], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-5-#20%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-6-#21%pv-ib-valuation%, id=56], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-6-#21%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-7-#22%pv-ib-valuation%, id=57], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-7-#22%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-8-#23%pv-ib-valuation%, id=58], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-8-#23%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-9-#24%pv-ib-valuation%, id=59], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-9-#24%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=db-checkpoint-thread-#93%pv-ib-valuation%, id=161], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=db-checkpoint-thread-#93%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=dms-writer-thread-#77%pv-ib-valuation%, id=145], state=WAITING
Locked pages = []
Locked pages log: name=dms-writer-thread-#77%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=exchange-worker-#71%pv-ib-valuation%, id=139], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#71%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=lock-cleanup-0, id=278], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-0 time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=lock-cleanup-scheduled-0, id=171], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-scheduled-0 time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=main, id=1], state=WAITING
Locked pages = []
Locked pages log: name=main time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=query-#5729%pv-ib-valuation%, id=6455], state=WAITING
Locked pages = []
Locked pages log: name=query-#5729%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5730%pv-ib-valuation%, id=6456], state=WAITING
Locked pages = []
Locked pages log: name=query-#5730%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5735%pv-ib-valuation%, id=6461], state=WAITING
Locked pages = []
Locked pages log: name=query-#5735%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5736%pv-ib-valuation%, id=6462], state=WAITING
Locked pages = []
Locked pages log: name=query-#5736%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-0-#1%pv-ib-valuation%, id=36], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-0-#1%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-1-#2%pv-ib-valuation%, id=37], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-1-#2%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-10-#11%pv-ib-valuation%, id=46], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-10-#11%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-11-#12%pv-ib-valuation%, id=47], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-11-#12%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-12-#13%pv-ib-valuation%, id=48], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-12-#13%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-13-#14%pv-ib-valuation%, id=49], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-13-#14%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-2-#3%pv-ib-valuation%, id=38], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-2-#3%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-3-#4%pv-ib-valuation%, id=39], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-3-#4%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-4-#5%pv-ib-valuation%, id=40], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-4-#5%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-5-#6%pv-ib-valuation%, id=41], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-5-#6%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-6-#7%pv-ib-valuation%, id=42], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-6-#7%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-7-#8%pv-ib-valuation%, id=43], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-7-#8%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-8-#9%pv-ib-valuation%, id=44], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-8-#9%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-9-#10%pv-ib-valuation%, id=45], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-9-#10%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=ttl-cleanup-worker-#62%pv-ib-valuation%, id=127], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=ttl-cleanup-worker-#62%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

TxTimeoutOnPartitionMapExchange is about rolling back active transactions to unlock a PME process. TxTimeoutOnPartitionMapExchange是关于回滚活动事务以解锁 PME 进程。 It won't magically unlock every PME that could be stuck due to different reasons.它不会神奇地解锁每个可能因不同原因而卡住的 PME。

For sure, it's worth having this setting configured in any case.当然,在任何情况下都值得配置此设置。 To enable it, you need to adjust your server nodes' configuration and set this property to some value, like 30 secs.要启用它,您需要调整服务器节点的配置并将此属性设置为某个值,例如 30 秒。 Here is an example of XML changes. 是 XML 更改的示例。

Speaking of the original CQ issue with client disconnects, I'd expect Ignite to handle that automatically with no problems.说到客户端断开连接的原始 CQ 问题,我希望 Ignite 能够毫无问题地自动处理该问题。 In other words, I don't think the issue of a hung PME is caused by a continuous query itself, but rather by something else, like yes, active TXs without a timeout.换句话说,我不认为挂起的 PME 问题是由连续查询本身引起的,而是由其他原因引起的,比如是的,没有超时的活动 TX。

You don't need to change atomicyMode of your caches.您不需要更改缓存的原子模式。 Transactions can't be applied to a non-transactional cache (atomic).事务不能应用于非事务缓存(原子)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM