為什么我的Cassandra預處理語句提取數據這么慢？

Question

我有一個100,000個名稱的Java列表，我想將其提取到一個3節點的Cassandra集群中，該集群運行帶有Cassandra 3.10.0的Datastax Enterprise 5.1

我的代碼已提取，但需要很長時間。 我在群集上進行了壓力測試，每秒能夠完成25,000次寫入。 通過我的攝取代碼，我獲得了大約200 /秒的糟糕表現。

我的Java列表中有100,000個名稱，稱為myList。 我使用以下准備好的語句和會話執行來攝取數據。

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

         int id = 0;

         for(int i = 0; i < myList.size(); i++) {
             id += 1;
             session.execute(prepared.bind(id, myList.get(i)));
        }

我在代碼中添加了一個群集監視器，以查看發生了什么。 這是我的監控代碼。

    /// Monitoring Status of Cluster
    final LoadBalancingPolicy loadBalancingPolicy =
    cluster.getConfiguration().getPolicies().getLoadBalancingPolicy();
    ScheduledExecutorService scheduled =
    Executors.newScheduledThreadPool(1);
        scheduled.scheduleAtFixedRate(() -> {
            Session.State state = session.getState();
            state.getConnectedHosts().forEach((host) -> {
                HostDistance distance = loadBalancingPolicy.distance(host);
                int connections = state.getOpenConnections(host);
                int inFlightQueries = state.getInFlightQueries(host);
                System.out.printf("%s connections=%d, current load=%d, maxload=%d%n",
                        host, connections, inFlightQueries,
                        connections *
                                poolingOptions.getMaxRequestsPerConnection(distance));
            });
    }, 5, 5, TimeUnit.SECONDS);

監視的5秒輸出顯示3次迭代的以下內容：

/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=0, maxload=32768
/192.168.20.26:9042 connections=1, current load=1, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768

看來我沒有非常有效地利用我的集群。 我不確定自己在做什么錯，將不勝感激任何提示。

謝謝！

Answer 1

使用executeAsync。

異步執行提供的查詢。 此方法不會阻止。 查詢一旦傳遞到基礎網絡堆棧，它將立即返回。 特別是，從此方法返回並不能保證查詢有效，甚至不能提交給活動節點。 訪問ResultSetFuture時，將引發與查詢失敗有關的任何異常。

您正在插入大量數據。 如果您使用executeAsync並且您的群集無法處理如此大量的數據，則它可能引發異常。 您可以使用信號量限制executeAsync。

范例：

PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");

int numberOfConcurrentQueries = 100;
final Semaphore semaphore = new Semaphore(numberOfConcurrentQueries);

int id = 0;    

for(int i = 0; i < myList.size(); i++) {
    try {
        id += 1;
        semaphore.acquire();
        ResultSetFuture future = session.executeAsync(prepared.bind(id, myList.get(i)));
        Futures.addCallback(future, new FutureCallback<ResultSet>() {
            @Override
            public void onSuccess(ResultSet result) {
                semaphore.release();
            }

            @Override
            public void onFailure(Throwable t) {
                semaphore.release();
            }
        });
    } catch (Exception e) {
        semaphore.release();
        e.printStackTrace();
    }
}

資源：
https://stackoverflow.com/a/30526719/2320144 https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html#executeAsync-com.datastax.driver .core.Statement-

為什么我的Cassandra預處理語句提取數據這么慢？

問題描述

1 個解決方案

解決方案1
4 已采納 2017-04-22 13:22:27

為什么我的Cassandra預處理語句提取數據這么慢？

問題描述

1 個解決方案

解決方案1 4 已采納 2017-04-22 13:22:27

解決方案1
4 已采納 2017-04-22 13:22:27