Kubernetes 上的 Apache Ignite 未加入集群

Question

I am trying to setup a simple two node ignite cluster with kubernetes.我正在尝试使用 kubernetes 设置一个简单的两节点 ignite 集群。 The same configuration works fine when running on VM directly.直接在 VM 上运行时，相同的配置工作正常。

Essentially I have two pods that are microservice written in Vertx with Ignite as embedded node , pod1 exposes 9090 via service1 and pod2 exposes 9092 via service2基本上我有两个吊舱是微服务写在Vertx在Ignite为嵌入式节点，术后第一天通过公开9090 service1通过与POD2自曝9092 service2

Both the pods use ignite-service to expose the Ignite discovery ports 47100 and 47500 and both pods implements KubernetesIPFinder两个 Pod 都使用ignite-service来公开 Ignite 发现端口 47100 和 47500，并且两个 Pod 都实现了 KubernetesIPFinder

pod1 --> service1(9090, 10900)   |
                                 | --> ignite-service (47100/TCP,47500/TCP)
pod2 --> service2(9092, 10900)   |               ^
                                                 |
    KubernetesIPFinder----------------------------
            ns = ignite-ns
            svc = ignite-service
        ServiceAccount (ignite-account)

When both pods start I can see the discovery happening but the second pod always hangs with the below logs.当两个 Pod 启动时，我可以看到正在发生的发现，但第二个 Pod 总是挂起以下日志。 I am not sure if this because of the way I configured the k8s objects or some resource contention in the k8s.我不确定这是因为我配置 k8s 对象的方式还是 k8s 中的某些资源争用。

If changed the configuration to use thin clients for the pods then everything works fine.如果将配置更改为对 Pod 使用瘦客户端，则一切正常。 The pods are able to start ignite and expose the rest endpoints of the vertx application Pod 能够启动并暴露 vertx 应用程序的其余端点

[INFO ] 2020-08-26 16:09:08.969 [main] IgniteKernal%aztecCommunityUserIgnite - VM arguments: [-Xms1g, -Xmx1g, -XX:MaxGCPauseMillis=500, -XX:GCPauseIntervalMillis=30000, -XX:InitiatingHeapOccupancyPercent=60, -XX:G1ReservePercent=30, -XX:+HeapDumpOnOutOfMemoryError, -XX:+DisableExplicitGC, -Djava.net.preferIPv4Stack=true, -XX:+UseG1GC, -Xlog:gc*,safepoint,age*,ergo*:file=/app/aztec/logs/gc-%p-%t.log:tags,uptime,time,level:filecount=10,filesize=50m, -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true, -DIGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=300000, -Dlog4j.configurationFile=file:///app/aztec/communityuser_service/conf/log4j2.xml, -DIGNITE_WAIT_FOR_BACKUPS_ON_SHUTDOWN=true, -DIGNITE_NO_SHUTDOWN_HOOK=true, -DIGNITE_WAL_MMAP=false]
[INFO ] 2020-08-26 16:09:08.970 [main] IgniteKernal%aztecCommunityUserIgnite - System cache's DataRegion size is configured to 40 MB. Use DataStorageConfiguration.systemRegionInitialSize property to change the setting.
[INFO ] 2020-08-26 16:09:08.970 [main] IgniteKernal%aztecCommunityUserIgnite - Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor - Configured plugins:
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor -   ^-- None
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor -
[INFO ] 2020-08-26 16:09:09.059 [main] FailureProcessor - Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[WARN ] 2020-08-26 16:09:09.278 [main] TcpCommunicationSpi - Failure detection timeout will be ignored (one of SPI parameters has been set explicitly)
[INFO ] 2020-08-26 16:09:09.299 [main] TcpCommunicationSpi - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[WARN ] 2020-08-26 16:09:09.302 [main] TcpCommunicationSpi - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[WARN ] 2020-08-26 16:09:09.312 [main] NoopCheckpointSpi - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[WARN ] 2020-08-26 16:09:09.337 [main] GridCollisionManager - Collision resolution is disabled (all jobs will be activated upon arrival).
[INFO ] 2020-08-26 16:09:09.341 [main] IgniteKernal%aztecCommunityUserIgnite - Security status [authentication=off, tls/ssl=off]
[INFO ] 2020-08-26 16:09:09.392 [main] TcpDiscoverySpi - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=11e43ce8-b846-41ac-b688-9c6c34aebcf9]
[INFO ] 2020-08-26 16:09:09.421 [main] PdsFoldersResolver - Successfully created new persistent storage folder [/app/aztec/data/ignite/db/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c]
[INFO ] 2020-08-26 16:09:09.422 [main] PdsFoldersResolver - Consistent ID used for local node is [6cd407c6-0c86-4e57-9803-ab56bec5b16c] according to persistence data storage folders
[INFO ] 2020-08-26 16:09:09.423 [main] CacheObjectBinaryProcessorImpl - Resolved directory for serialized binary metadata: /app/aztec/data/ignite/binary_meta/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.637 [main] FilePageStoreManager - Resolved page store work directory: /app/aztec/data/ignite/db/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.637 [main] FileWriteAheadLogManager - Resolved write ahead log work directory: /app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.638 [main] FileWriteAheadLogManager - Resolved write ahead log archive directory: /app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.951 [main] FileHandleManagerImpl - Initialized write-ahead log manager [mode=BACKGROUND]
[WARN ] 2020-08-26 16:09:09.954 [main] GridCacheDatabaseSharedManager - DataRegionConfiguration.maxWalArchiveSize instead DataRegionConfiguration.walHistorySize would be used for removing old archive wal files
[INFO ] 2020-08-26 16:09:09.975 [main] GridCacheDatabaseSharedManager - Configured data regions initialized successfully [total=4]
[INFO ] 2020-08-26 16:09:09.993 [main] PartitionsEvictManager - Evict partition permits=2
[WARN ] 2020-08-26 16:09:10.029 [main] IgniteH2Indexing - Serialization of Java objects in H2 was enabled.
[INFO ] 2020-08-26 16:09:10.251 [main] ClientListenerProcessor - Client connector processor has started on TCP port 10900
[INFO ] 2020-08-26 16:09:10.324 [main] GridTcpRestProtocol - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[INFO ] 2020-08-26 16:09:10.374 [main] IgniteKernal%aztecCommunityUserIgnite - Non-loopback local IPs: 172.17.239.163
[INFO ] 2020-08-26 16:09:10.375 [main] IgniteKernal%aztecCommunityUserIgnite - Enabled local MACs: 2255F14C9361
[INFO ] 2020-08-26 16:09:10.381 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.388 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.391 [main] GridCacheDatabaseSharedManager - Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.428 [main] GridCacheDatabaseSharedManager - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheDatabaseSharedManager - Finished applying WAL changes [updatesApplied=0, time=0 ms]
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheProcessor - Restoring partition state for local groups.
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheProcessor - Finished restoring partition state for local groups [groupsProcessed=0, partitionsProcessed=0, time=0ms]
[INFO ] 2020-08-26 16:09:10.483 [main] FilePageStoreManager - Cleanup cache stores [total=1, left=0, cleanFiles=false]
[INFO ] 2020-08-26 16:09:10.491 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.492 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.493 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.502 [main] GridCacheDatabaseSharedManager - Configured data regions started successfully [total=4]
[INFO ] 2020-08-26 16:09:10.503 [main] GridCacheDatabaseSharedManager - Starting binary memory restore for: [-2100569601]
[INFO ] 2020-08-26 16:09:10.518 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.518 [main] GridCacheDatabaseSharedManager - Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.522 [main] FileWriteAheadLogManager - Resuming logging to WAL segment [file=/app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c/0000000000000000.wal, offset=0, ver=2]
[INFO ] 2020-08-26 16:09:10.684 [main] GridCacheProcessor - Started cache in recovery mode [name=ignite-sys-cache, id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
[INFO ] 2020-08-26 16:09:10.689 [main] GridCacheDatabaseSharedManager - Binary recovery performed in 186 ms.
[INFO ] 2020-08-26 16:09:10.690 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.690 [main] GridCacheDatabaseSharedManager - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.692 [main] GridCacheDatabaseSharedManager - Finished applying WAL changes [updatesApplied=0, time=0 ms]
[INFO ] 2020-08-26 16:09:10.692 [main] GridCacheProcessor - Restoring partition state for local groups.
[INFO ] 2020-08-26 16:09:10.703 [main] GridCacheProcessor - Finished restoring partition state for local groups [groupsProcessed=1, partitionsProcessed=0, time=10ms]
[INFO ] 2020-08-26 16:09:10.738 [main] TcpDiscoverySpi - Connection check threshold is calculated: 300000

[INFO ] 2020-08-26 16:11:18.387 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery accepted incoming connection [rmtAddr=/172.17.239.64, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.395 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery spawning a new thread for connection [rmtAddr=/172.17.239.64, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.396 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Started serving remote node connection [rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.399 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Received ping request from the remote node [rmtNodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.400 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Finished writing ping response [rmtNodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.400 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Finished serving remote node connection [rmtAddr=/172.17.239.64:34837, rmtPort=34837
[INFO ] 2020-08-26 16:13:25.749 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery accepted incoming connection [rmtAddr=/172.17.239.64, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.749 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery spawning a new thread for connection [rmtAddr=/172.17.239.64, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.750 [tcp-disco-sock-reader-[]-#5%aztecCommunityUserIgnite%] TcpDiscoverySpi - Started serving remote node connection [rmtAddr=/172.17.239.64:36858, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.752 [tcp-disco-sock-reader-[f4df02cf 172.17.239.64:36858]-#5%aztecCommunityUserIgnite%] TcpDiscoverySpi - Initialized connection with remote server node [nodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:36858]
[INFO ] 2020-08-26 16:13:25.772 [tcp-disco-msg-worker-[]-#2%aztecCommunityUserIgnite%] TcpDiscoverySpi - New next node [newNext=TcpDiscoveryNode [id=f4df02cf-0700-4f31-93b0-9073c9394d2d, consistentId=b003163e-ef90-450a-885c-6d7e9b0cbef4, addrs=ArrayList [127.0.0.1, 172.17.193.243], sockAddrs=HashSet [sit-aztec-authentication-service/192.168.164.225:47500, /127.0.0.1:47500, /172.17.193.243:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1598458405757, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false]]

Update:更新：

IgniteConfiguration:点燃配置：

[INFO ] 2020-08-26 16:25:03.364 [main] IgniteKernal%aztecAuthIgnite - IgniteConfiguration [igniteInstanceName=aztecAuthIgnite, pubPoolSize=8, svcPoolSize=8, callbackPoo
lSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=1, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000, p2pPoolSize=
2, qryPoolSize=8, sqlQryHistSize=1000, dfltQryTimeout=0, igniteHome=null, igniteWorkDir=/app/aztec/data/ignite, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@d554c5f,
 nodeId=3b17a57c-6ee6-4225-bc50-a762f6ec50af, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=150000, netCompressionLevel=1, s
ndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeo
ut=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, sk
ipAddrsRandomization=false], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connect
Gate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@60c38c44, chConnPlc=null, enableForcibleNodeKill=false, enableTroub
leshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=
5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections
=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=-1, boundTc
pShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@1ee2a1e2[Count = 1], stopping=false, metricsLsnr=null],
 evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@59ae2de7, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.
indexing.noop.NoopIndexingSpi@38bb9fad, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@11620476, clientMode=false, rebalanceThrea
dPoolSize=4, rebalanceTimeout=10000, rebalanceBatchesPrefetchCnt=3, rebalanceThrottle=0, rebalanceBatchSize=524288, txCfg=TransactionConfiguration [txSerEnabled=false,
dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000, pessimisticTxLogSize=0, pessimist
icTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissed
CacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=300000, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=30
000, metricsLogFreq=60000, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768,
rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=1, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFa
ctory=null, sslFactory=null, portRange=100, threadPoolSize=8, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, c
acheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStora
geConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=4096, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default_Region, maxSize
=131072000, initSize=26214400, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5,
 metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true], dataRegions=null, storagePath=db, checkpointFreq=60000, lo
ckWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=250000000, walSegments=4, walSegmentSize=67108864, walPath=db/wal, walArchivePath=db/wal, metricsEnabled=false, walMode=BACKGROUND, walTlbSize=131072, walBuffSize=33554432, walFlushFreq=5000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@25a02442, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=true, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null, walPageCompression=DISABLED, walPageCompressionLevel=null], activeOnStart=true, autoActivation=false, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=sit-aztec-authentication-service, port=10900, portRange=10, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=64, threadPoolSize=8, idleTimeout=0, handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration [maxActiveTxPerConn=100]], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=null, commFailureRslvr=null]

Answer 1

I figured what the issue is here.我想出了这里的问题。 Apparently it has to do with the way I configured the service object in kubernetes.显然这与我在 kubernetes 中配置服务对象的方式有关。 I am not sure if this is a bug or a feature but it looks like an Ignite node can only scale to node and not across nodes.我不确定这是错误还是功能，但看起来 Ignite 节点只能扩展到节点而不是跨节点。 What I mean by this is the service object should be unique to a node.我的意思是服务对象应该是节点唯一的。 If you share the service object across nodes (microservices) expecting the cluster to spread across multiple nodes it will hang.如果您跨节点（微服务）共享服务对象，希望集群分布在多个节点上，它将挂起。 (I am not sure if this is an anti-pattern) What worked was keeping the service object unique to the node and then scaling the node if required. （我不确定这是否是一种反模式）有效的是保持节点独有的服务对象，然后根据需要扩展节点。

I think if this is the case then we should probably keep the ignite nodes as a separate cluster and not embedded within the micro-services.我认为如果是这种情况，那么我们可能应该将 ignite 节点保留为一个单独的集群，而不是嵌入到微服务中。

Answer 2

as per your Kubernetes Deployment, you likely have defined a readinessProbe on your spec.template.spec.container根据您的 Kubernetes 部署，您可能已经在spec.template.spec.container上定义了一个readinessProbe

This will prevent the Pod to be registered as Endpoints under the Kubernetes Service , and each Ignite embedded node will start its own cluster of 1 node :-/这将阻止 Pod 在 Kubernetes Service下注册为Endpoints ，并且每个 Ignite 嵌入式节点将启动自己的 1 个节点的cluster ：-/

give it a try without the readinessProbe and see if your Ignite nodes join the same cluster.在没有readinessProbe情况下尝试一下，看看您的 Ignite 节点是否加入同一个集群。

see Ignite ReadinessProbe参见Ignite ReadinessProbe

Kubernetes 上的 Apache Ignite 未加入集群

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-09-02 05:17:40

解决方案2
0 2020-09-03 11:31:37

Kubernetes 上的 Apache Ignite 未加入集群

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-09-02 05:17:40

解决方案2 0 2020-09-03 11:31:37

解决方案1
1 已采纳 2020-09-02 05:17:40

解决方案2
0 2020-09-03 11:31:37