简体   繁体   English

为什么我的Cassandra预处理语句提取数据这么慢?

[英]Why is my Cassandra Prepared Statement Ingest of Data so slow?

I have a Java list of 100,000 names that I'd like to ingest into a 3 node Cassandra cluster that is running Datastax Enterprise 5.1 with Cassandra 3.10.0 我有一个100,000个名称的Java列表,我想将其提取到一个3节点的Cassandra集群中,该集群运行带有Cassandra 3.10.0的Datastax Enterprise 5.1

My code ingests but it takes a looooong time. 我的代码已提取,但需要很长时间。 I ran a stress test on the cluster and was able to do over 25,000 writes per second. 我在群集上进行了压力测试,每秒能够完成25,000次写入。 With my ingest code I am getting a terrible performace of around 200/second. 通过我的摄取代码,我获得了大约200 /秒的糟糕表现。

My Java List has 100,000 names in it and is called myList. 我的Java列表中有100,000个名称,称为myList。 I use the following prepared statement and session execution to ingest the data. 我使用以下准备好的语句和会话执行来摄取数据。

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

         int id = 0;

         for(int i = 0; i < myList.size(); i++) {
             id += 1;
             session.execute(prepared.bind(id, myList.get(i)));
        }

I added a cluster monitor to my code to see what was going on. 我在代码中添加了一个群集监视器,以查看发生了什么。 Here is my monitoring code. 这是我的监控代码。

    /// Monitoring Status of Cluster
    final LoadBalancingPolicy loadBalancingPolicy =
    cluster.getConfiguration().getPolicies().getLoadBalancingPolicy();
    ScheduledExecutorService scheduled =
    Executors.newScheduledThreadPool(1);
        scheduled.scheduleAtFixedRate(() -> {
            Session.State state = session.getState();
            state.getConnectedHosts().forEach((host) -> {
                HostDistance distance = loadBalancingPolicy.distance(host);
                int connections = state.getOpenConnections(host);
                int inFlightQueries = state.getInFlightQueries(host);
                System.out.printf("%s connections=%d, current load=%d, maxload=%d%n",
                        host, connections, inFlightQueries,
                        connections *
                                poolingOptions.getMaxRequestsPerConnection(distance));
            });
    }, 5, 5, TimeUnit.SECONDS); 

The monitoring 5 second output shows the following for 3 iterations: 监视的5秒输出显示3次迭代的以下内容:

/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=0, maxload=32768
/192.168.20.26:9042 connections=1, current load=1, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768

It doesn't appear that I am very effectively utilizing my cluster. 看来我没有非常有效地利用我的集群。 I'm not sure what I am doing wrong and would greatly appreciate any tips. 我不确定自己在做什么错,将不胜感激任何提示。

Thank you! 谢谢!

Use executeAsync. 使用executeAsync。

Executes the provided query asynchronously. 异步执行提供的查询。 This method does not block. 此方法不会阻止。 It returns as soon as the query has been passed to the underlying network stack. 查询一旦传递到基础网络堆栈,它将立即返回。 In particular, returning from this method does not guarantee that the query is valid or has even been submitted to a live node. 特别是,从此方法返回并不能保证查询有效,甚至不能提交给活动节点。 Any exception pertaining to the failure of the query will be thrown when accessing the ResultSetFuture. 访问ResultSetFuture时,将引发与查询失败有关的任何异常。

You are inserting huge amount of data. 您正在插入大量数据。 If you use executeAsync and your cluster could not handle such amount of data, it can throw exception. 如果您使用executeAsync并且您的群集无法处理如此大量的数据,则它可能引发异常。 You can limit executeAsync with Semaphore. 您可以使用信号量限制executeAsync。

Example : 范例:

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

int numberOfConcurrentQueries = 100;
final Semaphore semaphore = new Semaphore(numberOfConcurrentQueries);

int id = 0;    

for(int i = 0; i < myList.size(); i++) {
    try {
        id += 1;
        semaphore.acquire();
        ResultSetFuture future = session.executeAsync(prepared.bind(id, myList.get(i)));
        Futures.addCallback(future, new FutureCallback<ResultSet>() {
            @Override
            public void onSuccess(ResultSet result) {
                semaphore.release();
            }

            @Override
            public void onFailure(Throwable t) {
                semaphore.release();
            }
        });
    } catch (Exception e) {
        semaphore.release();
        e.printStackTrace();
    }
}

Source : 资源 :
https://stackoverflow.com/a/30526719/2320144 https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html#executeAsync-com.datastax.driver.core.Statement- https://stackoverflow.com/a/30526719/2320144 https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html#executeAsync-com.datastax.driver .core.Statement-

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM