简体   繁体   中英

Unexpected gossip interval growth and message size growth in Akka Distributed Data

I'm using Akka Distributed Data for replicating order data between three nodes. Currently, I'm only using GSet and LWWRegister CRDT types. But, a short time after a message is sent to the replicator, the gossip interval increases, and the replicator logs a serialization error noting that the max allowed message size has been exceeded. The same error can be reproduced even with a smaller message value such as a char. After about 10 minutes of operation, the Java heap goes out of memory.

I have followed the Akka documentation published here in order to implement the system.

I searched for a solution for this for a few days and already tried changing the application.conf to allow a larger frame size, increasing ddata gossip interval and isolating the data replication part of the application and running it. I have tried the following methods and they didn't give me a proper solution. 1 , 2 , 3

Here's a part of my application.conf related to the Akka remote and Akka ddata

remote.artery {
    enabled = on
    transport = tcp
    canonical.port = 5053
    canonical.hostname = 127.0.0.1
    advanced {
            maximum-frame-size = 256KiB
            buffer-pool-size = 128
            maximum-large-frame-size = 4MiB
            large-buffer-pool-size = 32
    }
.....

akka.cluster.distributed-data {
  name = ddataReplicator
  role = "AthenaLB_1"
  gossip-interval = 2 s
  notify-subscribers-interval = 500 ms
  max-delta-elements = 1000
  use-dispatcher = ""
  pruning-interval = 120 s
  max-pruning-dissemination = 300 s
  pruning-marker-time-to-live = 6 h
  serializer-cache-time-to-live = 60 s

  # Settings for delta-CRDT
  delta-crdt {
    enabled = on
    max-delta-size = 200
  }

  durable {
    keys = []
    pruning-marker-time-to-live = 10 d
    store-actor-class = akka.cluster.ddata.LmdbDurableStore
    use-dispatcher = akka.cluster.distributed-data.durable.pinned-store
    pinned-store {
      executor = thread-pool-executor
      type = PinnedDispatcher
    }

    lmdb { 
      dir = "ddata"
      map-size = 100 MiB
      write-behind-interval = off
    }
  }

Here are the methods I have used to replicate the state between nodes. Please note that I have omitted the unnecessary parts of the code.

private final ActorRef replicator = DistributedData.get(getContext().getSystem()).replicator();
private final SelfUniqueAddress node = DistributedData.get(getContext().getSystem()).selfUniqueAddress();
private static final Replicator.ReadConsistency readMajority = new Replicator.ReadMajority(Duration.ofSeconds(30));
private final Replicator.WriteConsistency writeMajority = new Replicator.WriteMajority(Duration.ofSeconds(30));


private void replicateState(ExchangeSupervisorProtos.SavedExchangeList recovery) {
    LOGGER.info("Sending replication message: {}", recovery.toString());
    Replicator.Update<LWWRegister<ExchangeSupervisorProtos.SavedExchangeList>> update = new Replicator.Update<>(
                        exchangeSupervisorRecoveryKey,
                        LWWRegister.create(node, recovery),
                        writeMajority,
                        curr -> updateExchangeSupervisorRecovery(curr, recovery));
        replicator.tell(update, self());
}

private LWWRegister<ExchangeSupervisorProtos.SavedExchangeList>   updateExchangeSupervisorRecovery(LWWRegister<ExchangeSupervisorProtos.SavedExchangeList> data,
ExchangeSupervisorProtos.SavedExchangeList recovery) {

return data.withValue(DistributedData.get(node, recovery,LWWRegister.defaultClock());
}

private void replicateState(ExchangeRecovery recovery) {
        LOGGER.info("Sending replication message: {}", recovery.toString());
        Replicator.Update<GSet<ExchangeRecovery>> update =
                new Replicator.Update<>(
                        exchangeRecoveryKey,
                        GSet.create(),
                        writeMajority,
                        curr -> updateExchangeRecovery(curr, recovery));
        replicator.tell(update, self());
    }

private GSet<ExchangeRecovery>
    updateExchangeRecovery(GSet<ExchangeRecovery> data,
                           ExchangeRecovery recovery) {
    return data.add(recovery);
}

Here is a part of the node's logs.

[ERROR] [04/09/2019 11:54:48.330] [AlgoEngine-akka.remote.default-remote-dispatcher-5] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:54:56.824] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:02.724] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:07.974] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:14.427] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:18.940] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:24.855] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:30.995] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:36.779] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:44.069] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[WARN] [04/09/2019 11:55:49.770] [AlgoEngine-akka.actor.default-dispatcher-2] [akka.cluster.Cluster(akka://AlgoEngine)] Cluster Node [akka://AlgoEngine@127.0.0.1:5053] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [2006] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC.
[WARN] [04/09/2019 11:55:49.772] [AlgoEngine-akka.actor.default-dispatcher-2] [akka.remote.PhiAccrualFailureDetector@61981fa5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5051: 2007 millis
[WARN] [04/09/2019 11:55:49.776] [AlgoEngine-akka.actor.default-dispatcher-8] [akka.remote.PhiAccrualFailureDetector@4c4e6ca5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5052: 2008 millis
[ERROR] [04/09/2019 11:55:51.887] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:55:58.012] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:04.127] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:10.890] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:16.821] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:23.487] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:29.192] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:35.278] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:41.683] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[WARN] [04/09/2019 11:56:52.906] [AlgoEngine-akka.actor.default-dispatcher-31] [akka.cluster.Cluster(akka://AlgoEngine)] Cluster Node [akka://AlgoEngine@127.0.0.1:5053] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [3143] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC.
[WARN] [04/09/2019 11:56:52.911] [AlgoEngine-akka.actor.default-dispatcher-8] [akka.remote.PhiAccrualFailureDetector@61981fa5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5051: 3148 millis
[WARN] [04/09/2019 11:56:52.917] [AlgoEngine-akka.actor.default-dispatcher-11] [akka.remote.PhiAccrualFailureDetector@4c4e6ca5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5052: 3151 millis
[ERROR] [04/09/2019 11:56:56.014] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:56:57.216] [AlgoEngine-akka.remote.default-remote-dispatcher-5] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5052/system/ddataReplicator#-1928484884]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:57:02.880] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[ERROR] [04/09/2019 11:57:10.264] [AlgoEngine-akka.remote.default-remote-dispatcher-4] [Encoder(akka://AlgoEngine)] Failed to serialize oversized message [akka.cluster.ddata.Replicator$Internal$Gossip].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://AlgoEngine@127.0.0.1:5051/system/ddataReplicator#-1087129379]): max allowed size 262144 bytes. Message type [akka.cluster.ddata.Replicator$Internal$Gossip].

[WARN] [04/09/2019 11:58:51.367] [AlgoEngine-akka.actor.default-dispatcher-28] [akka.remote.PhiAccrualFailureDetector@61981fa5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5051: 4549 millis
[ERROR] [SECURITY][04/09/2019 11:58:51.370] [AlgoEngine-akka.actor.default-dispatcher-24] [akka.actor.ActorSystemImpl(AlgoEngine)] Uncaught error from thread [AlgoEngine-akka.actor.default-dispatcher-24]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[AlgoEngine]
java.lang.OutOfMemoryError: Java heap space
    at akka.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:66)
    at akka.cluster.ddata.protobuf.ReplicatedDataSerializer.toBinary(ReplicatedDataSerializer.scala:376)
    at akka.cluster.ddata.protobuf.SerializationSupport.buildOther$1(SerializationSupport.scala:144)
    at akka.cluster.ddata.protobuf.SerializationSupport.otherMessageToProto(SerializationSupport.scala:160)
    at akka.cluster.ddata.protobuf.SerializationSupport.otherMessageToProto$(SerializationSupport.scala:138)
    at akka.cluster.ddata.protobuf.ReplicatorMessageSerializer.otherMessageToProto(ReplicatorMessageSerializer.scala:151)
    at akka.cluster.ddata.protobuf.ReplicatorMessageSerializer.dataEnvelopeToProto(ReplicatorMessageSerializer.scala:485)
    at akka.cluster.ddata.protobuf.ReplicatorMessageSerializer.toBinary(ReplicatorMessageSerializer.scala:233)
    at akka.cluster.ddata.Replicator.digest(Replicator.scala:1789)
    at akka.cluster.ddata.Replicator.getDigest(Replicator.scala:1778)
    at akka.cluster.ddata.Replicator.$anonfun$gossipTo$1(Replicator.scala:1924)
    at akka.cluster.ddata.Replicator$$Lambda$918/1723713864.apply(Unknown Source)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
    at scala.collection.TraversableLike$$Lambda$7/512549200.apply(Unknown Source)
    at scala.collection.immutable.Map$Map2.foreach(Map.scala:159)
    at scala.collection.TraversableLike.map(TraversableLike.scala:237)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at akka.cluster.ddata.Replicator.gossipTo(Replicator.scala:1924)
    at akka.cluster.ddata.Replicator.$anonfun$receiveGossipTick$1(Replicator.scala:1916)
    at akka.cluster.ddata.Replicator.$anonfun$receiveGossipTick$1$adapted(Replicator.scala:1916)
    at akka.cluster.ddata.Replicator$$Lambda$914/1652707776.apply(Unknown Source)
    at scala.Option.foreach(Option.scala:274)
    at akka.cluster.ddata.Replicator.receiveGossipTick(Replicator.scala:1916)
    at akka.cluster.ddata.Replicator$$anonfun$4.applyOrElse(Replicator.scala:1491)
    at akka.actor.Actor.aroundReceive(Actor.scala:539)
    at akka.actor.Actor.aroundReceive$(Actor.scala:537)
    at akka.cluster.ddata.Replicator.aroundReceive(Replicator.scala:1349)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:610)
    at akka.actor.ActorCell.invoke(ActorCell.scala:579)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
    at akka.dispatch.Mailbox.run(Mailbox.scala:229)

[WARN] [04/09/2019 11:58:51.377] [AlgoEngine-akka.actor.default-dispatcher-11] [akka.remote.PhiAccrualFailureDetector@4c4e6ca5] heartbeat interval is growing too large for address akka://AlgoEngine@127.0.0.1:5052: 4553 millis

Maybe my approach is wrong since I'm only working with Akka ddata for two weeks now. If somebody knows why this is happening, possible cause or a possible solution, please assist me. Thank you.

This issue was resolved by changing the following value in the remoting config.

    remote.artery {
        enabled = on
        transport = tcp
        canonical.port = 5053
        canonical.hostname = 127.0.0.1
        advanced {
                maximum-frame-size = 2MiB #Was 256KiB

Posting as the answer since I was able to resolve it on my own and didn't get any answers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM