简体   繁体   中英

Ehcache Jgroups replication using TCP not working in AWS cloud with 2 node cluster

Below is ehcache configuration we are using. We use Jgroups for cache replication.

ehcache.xml

<defaultCache
        maxElementsInMemory="10000"
        eternal="false"
        timeToIdleSeconds="1200"
        timeToLiveSeconds="86400"
        overflowToDisk="true"
        diskSpoolBufferSizeMB="30"
        maxElementsOnDisk="10000000"
        diskPersistent="false"
        diskExpiryThreadIntervalSeconds="120"
        memoryStoreEvictionPolicy="LRU">
    <cacheEventListenerFactory
            class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
            properties="replicateAsynchronously=true,replicatePuts=true,replicateUpdates=true,replicateUpdatesViaCopy=true,replicateRemovals=true" />
</defaultCache>

jgroups_tcp_config.xml

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">
   <!--Configure node ip inside bind_addr-->
   <TCP bind_addr="host1" bind_port="7831" max_bundle_size="9999999"/>
   <!--Configure nodes inside 'initial_hosts' property-->
   <TCPPING timeout="3000" initial_hosts="host1[7831],host2[7831]" port_range="1" num_initial_members="3"/>
   <FRAG2 frag_size="9999999"/>
   <MERGE3 max_interval="30000" min_interval="10000"/>
   <FD timeout="3000" max_tries="10"/>
   <VERIFY_SUSPECT timeout="1500"/>
   <pbcast.NAKACK use_mcast_xmit="false" exponential_backoff="500" discard_delivered_msgs="false"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/>
   <pbcast.GMS print_local_addr="true" join_timeout="5000" view_bundling="true"/>
</config>

Initially from the logs we can see that the nodes are getting clustered. Also we can see that messages are being replicated across nodes. But after some time, we see that messages are no more being replicated and hence resulting in erroneous behavior. Is there any problem with the jgroups configurations we are using?

Also we tried using NAKACK2, but the messages are not getting replicated across nodes at all. We simply replaced NAKACK with NAKACK2 in above configuration specified. Not sure where we are going wrong.

Above issue we are facing in AWS cloud.Ehcache Jgroups tcp will not work in cloud environment because cloud VPN dosn't support TCP multicasting due to which node discovery will not happen, to address this we are using jgroups_s3_config.xml instead of jgroups_tcp_config.xml in the AWS cloud.With the following jgroups_s3_config.xml configuration we have addressed the issue.

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.1.xsd">
    <TCP loopback="true" bind_port="7800"/>
    <S3_PING location="s3 bucket name should be in the same region in which app servers are running" 
    access_key="s3 bucket access key from aws credential file" 
    secret_access_key="s3 bucket secret access key from aws credential file" timeout="10000" num_initial_members="2"/>
    <FRAG2/>
    <MERGE2 min_interval="10000" max_interval="30000"/>
    <FD_ALL timeout="12000" interval="3000" timeout_check_interval="4000"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2 use_mcast_xmit="false" discard_delivered_msgs="false"/>
    <UNICAST2 timeout="300,600,1200"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="40K"/>
    <pbcast.GMS print_local_addr="true" join_timeout="5000" view_bundling="true"/>
</config>

Additionally we have to set the JAVA_OPTS.

  export JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM