简体   繁体   中英

JBOSS/Wildfly clustering and Kubernetes -

  1. Current configuration:
    • 16 pods running, JBoss TCP based cluster with google ping discovery. The container is deployed as stateful set on the Kubernetes cluster.
  2. The initial cluster without load working as expected without any single issue, but when the load increase the following behaviour were observed:
    • Some of the pods become unavailable during the managing of the initial load and in result of this those pods were restarted automatically.
    • After a restart of those pods, they start with new IP addresses, but the same hosts stay in the JBoss discovery file with the old IPs. In result, this discovery file contains hosts with multiple IP addresses.

 aaa-ops-stage-0 b6418a02-4db3-0397-ba2b-5a4a3e274560 10.20.0.17:7800 F aaa-ops-stage-1 d57dc7b7-997f-236e-eb9f-a1604ddafc8f 10.20.0.10:7800 F aaa-ops-stage-1 63a54371-111e-f9e9-3de5-65c6f6ff9dcd 10.20.0.16:7800 F aaa-ops-stage-1 2dfeb3d8-6cc4-03e0-719e-b4dbb8a63815 10.20.1.13:7800 T aaa-ops-stage-0 8053ed47-ba1b-5bb1-fcd2-a2cffb154703 10.20.0.9:7800 F aaa-ops-stage-0 7068cd6c-ff83-dd5d-1610-e5c03f089605 10.20.0.9:7800 F aaa-ops-stage-0 6230152a-1bc7-30ed-0073-816224bcdc26 10.20.0.14:7800 F 

  • When this happens and pod is restarted the boot of this pod is very slow because it tries to send cluster message to all of the records from the discovery file above. Because aaa-ops-stage-0 has new and only one IP all others aaa-ops-stage-0 just timeout. If the restarts are many for pod 0 more records we have in the discovery file. This also increases the boot time generally each time when pod restarts because it appears with new IP and the timeouts become even more.
  • There are readiness probes implemented in the pod configuration and are used to change the status of newly started pods, and by this, the load balancer knows when the pod is ready to receive requests. Unfortunately with the huge amount of timeouts described above, the pod never fully boots because the readiness probe restarts the pod after 60 seconds of being unavailable. In result of the eventually all pods stuck in a restart loop and the service completely stop.

I believe that if we have the possibility to use sticky IPs and when pod starts with 10.20.0.17 it stays with this IP during restarts. By doing this, we will avoid the behavior described above, and there will be no timeouts. No timeouts will reduce the restarts that are triggered from the readiness probes completely and the service will stay up and be running no meter the load that we produce.

The question is if there is any possibility to use static or sticky IP addresses for the running pods and if it is possible those IPs to persist during restarts? Any other suggestion is welcome as well!

There are few way to achieve your goals:

1 use kubernetes DNS addresses instead of IP addresses as K.Nicholas wrote.

2 use Calico CNI plugin and use annotations :

 annotations:
        cni.projectcalico.org/ipAddrs: "[\"192.168.0.1\"]"

for specifying IP address for your pods. Information on how to configure Calico in your cluster can be found in documentation.

By the way, it isn't a good practice to use sticky IP address.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM