Ehcache Jgroups replication using TCP not working in AWS cloud with 2 node cluster

Question

Below is ehcache configuration we are using. We use Jgroups for cache replication.

ehcache.xml

<defaultCache
        maxElementsInMemory="10000"
        eternal="false"
        timeToIdleSeconds="1200"
        timeToLiveSeconds="86400"
        overflowToDisk="true"
        diskSpoolBufferSizeMB="30"
        maxElementsOnDisk="10000000"
        diskPersistent="false"
        diskExpiryThreadIntervalSeconds="120"
        memoryStoreEvictionPolicy="LRU">
    <cacheEventListenerFactory
            class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
            properties="replicateAsynchronously=true,replicatePuts=true,replicateUpdates=true,replicateUpdatesViaCopy=true,replicateRemovals=true" />
</defaultCache>

jgroups_tcp_config.xml

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">
   <!--Configure node ip inside bind_addr-->
   <TCP bind_addr="host1" bind_port="7831" max_bundle_size="9999999"/>
   <!--Configure nodes inside 'initial_hosts' property-->
   <TCPPING timeout="3000" initial_hosts="host1[7831],host2[7831]" port_range="1" num_initial_members="3"/>
   <FRAG2 frag_size="9999999"/>
   <MERGE3 max_interval="30000" min_interval="10000"/>
   <FD timeout="3000" max_tries="10"/>
   <VERIFY_SUSPECT timeout="1500"/>
   <pbcast.NAKACK use_mcast_xmit="false" exponential_backoff="500" discard_delivered_msgs="false"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/>
   <pbcast.GMS print_local_addr="true" join_timeout="5000" view_bundling="true"/>
</config>

Initially from the logs we can see that the nodes are getting clustered. Also we can see that messages are being replicated across nodes. But after some time, we see that messages are no more being replicated and hence resulting in erroneous behavior. Is there any problem with the jgroups configurations we are using?

Also we tried using NAKACK2, but the messages are not getting replicated across nodes at all. We simply replaced NAKACK with NAKACK2 in above configuration specified. Not sure where we are going wrong.

Answer 1

Above issue we are facing in AWS cloud.Ehcache Jgroups tcp will not work in cloud environment because cloud VPN dosn't support TCP multicasting due to which node discovery will not happen, to address this we are using jgroups_s3_config.xml instead of jgroups_tcp_config.xml in the AWS cloud.With the following jgroups_s3_config.xml configuration we have addressed the issue.

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.1.xsd">
    <TCP loopback="true" bind_port="7800"/>
    <S3_PING location="s3 bucket name should be in the same region in which app servers are running" 
    access_key="s3 bucket access key from aws credential file" 
    secret_access_key="s3 bucket secret access key from aws credential file" timeout="10000" num_initial_members="2"/>
    <FRAG2/>
    <MERGE2 min_interval="10000" max_interval="30000"/>
    <FD_ALL timeout="12000" interval="3000" timeout_check_interval="4000"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="false"/>
    <UNICAST2 timeout="300,600,1200"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="40K"/>
    <pbcast.GMS print_local_addr="true" join_timeout="5000" view_bundling="true"/>
</config>

Additionally we have to set the JAVA_OPTS.

  export JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true"

Ehcache Jgroups replication using TCP not working in AWS cloud with 2 node cluster

Question

1 answers

solution1
0 2017-09-27 14:23:54

Ehcache Jgroups replication using TCP not working in AWS cloud with 2 node cluster

Question

1 answers

solution1 0 2017-09-27 14:23:54

solution1
0 2017-09-27 14:23:54