简体   繁体   English

从-cluster选项开始时Vert.x性能下降

[英]Vert.x performance drop when starting with -cluster option

I'm wondering if any one experienced the same problem. 我想知道是否有人遇到过同样的问题。

We have a Vert.x application and in the end it's purpose is to insert 600 million rows into a Cassandra cluster. 我们有一个Vert.x应用程序,最终目的是将6亿行插入到Cassandra集群中。 We are testing the speed of Vert.x in combination with Cassandra by doing tests in smaller amounts. 我们正在通过少量测试来测试Vert.x与Cassandra组合的速度。

If we run the fat jar (build with Shade plugin) without the -cluster option, we are able to insert 10 million records in about a minute. 如果运行不带-cluster选项的胖子罐(使用Shade插件构建),则大约一分钟就可以插入1000万条记录。 When we add the -cluster option (eventually we will run the Vert.x application in cluster) it takes about 5 minutes for 10 million records to insert. 当我们添加-cluster选项时(最终我们将在集群中运行Vert.x应用程序),大约需要5分钟才能插入1000万条记录。

Does anyone know why? 有人知道为什么吗?

We know that the Hazelcast config will create some overhead, but never thought it would be 5 times slower. 我们知道,Hazelcast配置会产生一些开销,但从未想到它会慢5倍。 This implies we will need 5 EC2 instances in cluster to get the same result when using 1 EC2 without the cluster option. 这意味着当使用1个不带集群选项的EC2时,集群中需要5个EC2实例才能获得相同的结果。

As mentioned, everything runs on EC2 instances: 如前所述,一切都在EC2实例上运行:

  • 2 Cassandra servers on t2.small t2.small上的2个Cassandra服务器
  • 1 Vert.x server on t2.2xlarge t2.2xlarge上的1个Vert.x服务器

You are actually running into corner cases of the Vert.x Hazelcast Cluster manager. 您实际上正在遇到Vert.x Hazelcast群集管理器的极端情况。

First of all you are using a worker Verticle to send your messages (30000001). 首先,您正在使用工作者Verticle发送消息(30000001)。 Under the hood Hazelcast is blocking and thus when you send a message from a worker the version 3.3.3 does not take that in account. Hazelcast掩盖了一切,因此当您从工作人员发送消息时,版本3.3.3并未考虑到这一点。 Recently we added this fix https://github.com/vert-x3/issues/issues/75 (not present in 3.4.0.Beta1 but present in 3.4.0-SNAPSHOTS) that will improve this case. 最近,我们添加了此修复程序https://github.com/vert-x3/issues/issues/75 (在3.4.0.Beta1中不存在,但在3.4.0-SNAPSHOTS中存在)将改善这种情况。

Second when you send all your messages at the same time, it runs into another corner case that prevents the Hazelcast cluster manager to use a cache of the cluster topology. 其次,当您同时发送所有消息时,它遇到了另一个极端情况,即阻止Hazelcast集群管理器使用集群拓扑的缓存。 This topology cache is usually updated after the first message has been sent and sending all the messages in one shot prevents the usage of the ache (short explanation HazelcastAsyncMultiMap#getInProgressCount will be > 0 and prevents the cache to be used), hence paying the penalty of an expensive lookup (hence the cache). 通常在发送完第一条消息后更新此拓扑缓存,并且一次性发送所有消息可避免使用ache(简短说明HazelcastAsyncMultiMap#getInProgressCount将> 0并阻止使用缓存),因此要付出代价昂贵的查找(因此需要缓存)。

If I use Bertjan's reproducer with 3.4.0-SNAPSHOT + Hazelcast and the following change: send message to destination, wait for reply. 如果我将Bertjan的复制器与3.4.0-SNAPSHOT + Hazelcast一起使用,并进行以下更改:将消息发送到目的地,请等待答复。 Upon reply send all messages then I get a lot of improvements. 回复后发送所有消息,那么我会得到很多改进。

Without clustering : 5852 ms With clustering with HZ 3.3.3 :16745 ms With clustering with HZ 3.4.0-SNAPSHOT + initial message : 8609 ms 不带群集:5852 ms带HZ 3.3.3的群集:16745 ms带HZ 3.4.0-SNAPSHOT +初始消息的群集:8609 ms

I believe also you should not use a worker verticle to send that many messages and instead send them using an event loop verticle via batches. 我相信,您也不应该使用工作台来发送那么多消息,而是使用事件循环台通过批处理发送它们。 Perhaps you should explain your use case and we can think about the best way to solve it. 也许您应该解释您的用例,我们可以考虑解决它的最佳方法。

When you're you enable clustering (of any kind) to an application you are making your application more resilient to failures but you're also adding a performance penalty. 当您对应用程序启用集群(任何类型)时,将使您的应用程序更能适应故障,但同时也会增加性能损失。

For example your current flow (without clustering) is something like: 例如,您当前的流程(无聚类)如下所示:

client -> 
  vert.x app -> 
    in memory same process eventbus (negletible) ->
    handler -> cassandra
  <- vert.x app
<- client

Once you enable clustering: 启用集群后:

client ->
  vert.x app ->
    serialize request ->
      network request cluster member ->
        deserialize request ->
          handler -> cassandra
        <- serialize response
      <- network reply
    <- deserialize response
  <- vert.x app
<- client

As you can see there are many encode decode operations required plus several network calls and this all gets added to your total request time. 如您所见,需要进行许多编码解码操作以及几个网络调用,所有这些都被添加到您的总请求时间中。

In order to achive best performance you need to take advantage of locality the closer you are of your data store usually the fastest. 为了获得最佳性能,您需要利用本地化优势,离数据存储越近,通常越快。

Just to add the code of the project. 仅添加项目代码。 I guess that would help. 我想那会有所帮助。

Sender verticle: 发件人垂直:

public class ProviderVerticle extends AbstractVerticle {

    @Override
    public void start() throws Exception {
        IntStream.range(1, 30000001).parallel().forEach(i -> {
        vertx.eventBus().send("clustertest1", Json.encode(new TestCluster1(i, "abc", LocalDateTime.now())));
        });
    }

    @Override
    public void stop() throws Exception {
        super.stop();
   }
}

And the inserter verticle 和插入器垂直

public class ReceiverVerticle extends AbstractVerticle {

    private int messagesReceived = 1;

    private Session cassandraSession;

    @Override
    public void start() throws Exception {

        PoolingOptions poolingOptions = new PoolingOptions()
                .setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
                .setMaxConnectionsPerHost(HostDistance.LOCAL, 3)
                .setCoreConnectionsPerHost(HostDistance.REMOTE, 1)
                .setMaxConnectionsPerHost(HostDistance.REMOTE, 3)
                .setMaxRequestsPerConnection(HostDistance.LOCAL, 20)
                .setMaxQueueSize(32768)
                .setMaxRequestsPerConnection(HostDistance.REMOTE, 20);

        Cluster cluster = Cluster.builder()
                .withPoolingOptions(poolingOptions)
                .addContactPoints(ClusterSetup.SEEDS)
                .build();

        System.out.println("Connecting session");
        cassandraSession = cluster.connect("kiespees");
        System.out.println("Session connected:\n\tcluster [" + cassandraSession.getCluster().getClusterName() + "]");
        System.out.println("Connected hosts: ");

        cassandraSession.getState().getConnectedHosts().forEach(host -> System.out.println(host.getAddress()));

        PreparedStatement prepared = cassandraSession.prepare(
                "insert into clustertest1 (id, value, created) " +
                        "values (:id, :value, :created)");

        PreparedStatement preparedTimer = cassandraSession.prepare(
                "insert into timer (name, created_on, amount) " +
                        "values (:name, :createdOn, :amount)");

        BoundStatement timerStart = preparedTimer.bind()
                .setString("name", "clusterteststart")
                .setInt("amount", 0)
                .setTimestamp("createdOn", new Timestamp(new Date().getTime()));
        cassandraSession.executeAsync(timerStart);

        EventBus bus = vertx.eventBus();

        System.out.println("Bus info: " + bus.toString());
        MessageConsumer<String> cons = bus.consumer("clustertest1");
        System.out.println("Consumer info: " + cons.address());

        System.out.println("Waiting for messages");

        cons.handler(message -> {
            TestCluster1 tc = Json.decodeValue(message.body(), TestCluster1.class);

            if (messagesReceived % 100000 == 0)
                System.out.println("Message received: " + messagesReceived);

            BoundStatement boundRecord = prepared.bind()
                    .setInt("id", tc.getId())
                    .setString("value", tc.getValue())
                    .setTimestamp("created", new Timestamp(new Date().getTime()));
            cassandraSession.executeAsync(boundRecord);

            if (messagesReceived % 100000 == 0) {
                BoundStatement timerStop = preparedTimer.bind()
                        .setString("name", "clusterteststop")
                        .setInt("amount", messagesReceived)
                        .setTimestamp("createdOn", new Timestamp(new Date().getTime()));
                cassandraSession.executeAsync(timerStop);
            }

            messagesReceived++;
            //message.reply("OK");
        });
    }

    @Override
    public void stop() throws Exception {
        super.stop();
        cassandraSession.close();
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM