简体   繁体   中英

Why is my Cassandra Prepared Statement Ingest of Data so slow?

I have a Java list of 100,000 names that I'd like to ingest into a 3 node Cassandra cluster that is running Datastax Enterprise 5.1 with Cassandra 3.10.0

My code ingests but it takes a looooong time. I ran a stress test on the cluster and was able to do over 25,000 writes per second. With my ingest code I am getting a terrible performace of around 200/second.

My Java List has 100,000 names in it and is called myList. I use the following prepared statement and session execution to ingest the data.

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

         int id = 0;

         for(int i = 0; i < myList.size(); i++) {
             id += 1;
             session.execute(prepared.bind(id, myList.get(i)));
        }

I added a cluster monitor to my code to see what was going on. Here is my monitoring code.

    /// Monitoring Status of Cluster
    final LoadBalancingPolicy loadBalancingPolicy =
    cluster.getConfiguration().getPolicies().getLoadBalancingPolicy();
    ScheduledExecutorService scheduled =
    Executors.newScheduledThreadPool(1);
        scheduled.scheduleAtFixedRate(() -> {
            Session.State state = session.getState();
            state.getConnectedHosts().forEach((host) -> {
                HostDistance distance = loadBalancingPolicy.distance(host);
                int connections = state.getOpenConnections(host);
                int inFlightQueries = state.getInFlightQueries(host);
                System.out.printf("%s connections=%d, current load=%d, maxload=%d%n",
                        host, connections, inFlightQueries,
                        connections *
                                poolingOptions.getMaxRequestsPerConnection(distance));
            });
    }, 5, 5, TimeUnit.SECONDS); 

The monitoring 5 second output shows the following for 3 iterations:

/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=0, maxload=32768
/192.168.20.26:9042 connections=1, current load=1, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768

It doesn't appear that I am very effectively utilizing my cluster. I'm not sure what I am doing wrong and would greatly appreciate any tips.

Thank you!

Use executeAsync.

Executes the provided query asynchronously. This method does not block. It returns as soon as the query has been passed to the underlying network stack. In particular, returning from this method does not guarantee that the query is valid or has even been submitted to a live node. Any exception pertaining to the failure of the query will be thrown when accessing the ResultSetFuture.

You are inserting huge amount of data. If you use executeAsync and your cluster could not handle such amount of data, it can throw exception. You can limit executeAsync with Semaphore.

Example :

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

int numberOfConcurrentQueries = 100;
final Semaphore semaphore = new Semaphore(numberOfConcurrentQueries);

int id = 0;    

for(int i = 0; i < myList.size(); i++) {
    try {
        id += 1;
        semaphore.acquire();
        ResultSetFuture future = session.executeAsync(prepared.bind(id, myList.get(i)));
        Futures.addCallback(future, new FutureCallback<ResultSet>() {
            @Override
            public void onSuccess(ResultSet result) {
                semaphore.release();
            }

            @Override
            public void onFailure(Throwable t) {
                semaphore.release();
            }
        });
    } catch (Exception e) {
        semaphore.release();
        e.printStackTrace();
    }
}

Source :
https://stackoverflow.com/a/30526719/2320144 https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html#executeAsync-com.datastax.driver.core.Statement-

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM