Cassandra使用Astyanax客戶端讀取性能

Question

我們Cassandra database in production environment中使用Cassandra database in production environment 。 我們有single cross colo cluster of 24 nodes的single cross colo cluster of 24 nodes這意味着12 nodes in PHX 12 nodes in SLC colo 。 replication factor of 4 ，這意味着2 copies will be there in each datacenter 。

以下是我們的Production DBA's創建keyspace和column families Production DBA's 。

使用placement_strategy ='org.apache.cassandra.locator.NetworkTopologyStrategy'和strategy_options = {slc：2，phx：2}創建密鑰空間配置文件；
 create column family PROFILE_USER with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and gc_grace = 86400; 

我們正在運行Cassandra 1.2.2 ，它具有org.apache.cassandra.dht.Murmur3Partitioner ， KeyCaching啟用了KeyCaching ， SizeTieredCompactionStrategy和Virtual Nodes 。 Cassandra節點部署在HDD instead of SSD上。

我正在使用Astyanax client以consistency level as ONE從Cassandra database讀取數據。 我使用Astyanax client在生產集群中插入了50 Millions records （跨24個節點，總共約285GB數據），在壓縮完成后，我開始read against the Cassandra production database進行read against the Cassandra production database 。

以下是我使用Astyanax client創建連接配置的代碼-

/**
 * Creating Cassandra connection using Astyanax client
 *
 */
private CassandraAstyanaxConnection() {

    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(100)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
        .setLocalDatacenter("phx") //filtering out the nodes basis on data center
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2")
        .setConnectionPoolType(ConnectionPoolType.ROUND_ROBIN)
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY, 
        StringSerializer.get(), 
        StringSerializer.get());
}

大多數時候，我在8/9/10 ms左右獲得95th percentile read performance 。

我想看看有什么辦法可以使Cassandra database獲得更好的read performance 。 我的印象是，在1 or 2 ms后我將獲得95％的百分位數，但是在對生產集群進行一些測試之后，我所有的假設都錯了。 從我運行客戶端程序的地方到Cassandra生產節點的Ping時間0.3ms average為0.3ms average 。

以下是我得到的結果。

Read Latency(95th Percentile)      Number of Threads    Duration the program was running(in minutes)    Throughput(requests/seconds)    Total number of id's requested    Total number of columns requested
    8 milliseconds                         10                      30                                               1584                              2851481                        52764072

任何人都可以闡明我可以嘗試其他哪些方法以達到良好的讀取延遲性能嗎？ 我知道在同樣的情況下可能會有相似的人在生產中使用Cassandra。 任何幫助將不勝感激。

謝謝您的幫助。

Answer 1

我會嘗試以下方法：

腹膜炎

將ConnectionPoolType設置為TOKEN_AWARE而不是ROUND_ROBIN。

另外，我將使用一些Astyanax延遲感知連接池功能。 例如：

.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(100)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
        .setLocalDatacenter("phx") //filtering out the nodes basis on data center
        .setLatencyScoreStrategy(new SmaLatencyScoreStrategyImpl(10000,10000,100,0.50))
    )

延遲設置是通過ScoreStrategy的構造函數提供的。 例如SmaLatencyScoreStrategyImpl 。

我也在解決這個問題，因此，如果我學到其他內容，我會在這里發帖。

請參閱：延遲和令牌感知配置

卡桑德拉

您可以做幾件事來優化讀取。 注意：我沒有嘗試過這些，但是它們在我要調查的事情清單上（所以我認為我願意分享）。

快取

啟用密鑰緩存和行緩存。

鍵緩存

bin/nodetool --host 127.0.0.1 --port 8080 setcachecapacity MyKeyspace MyColumnFam 200001 0

行緩存

bin/nodetool --host 127.0.0.1 --port 8080 setcachecapacity MyKeyspace MyColumnFam 0 200005

然后，在您的應用程序場景中，在該節點上敲擊一段時間后，檢查點擊率：

bin/nodetool --host 127.0.0.1  --port 8080 cfstats

一致性

考慮“讀取一致性”為“一”。請參見“數據一致性” （這是DataStax文檔，但仍然相關）

考慮降低讀取修復的機會。

update column family MyColumnFam with read_repair_chance=.5

降低read_repair_chance之后，請考慮調整復制因子以幫助提高讀取性能（但這會殺死寫入，因為我們將寫入更多節點）。

create keyspace cache with replication_factor=XX;

磁碟

不知道這里是否有任何事情要做，但我認為應該包括在內。 確保最佳文件系統（例如ext4）。 如果您有很高的復制因子，我們可以圍繞它優化磁盤（知道我們將從Cassandra獲得持久性）。 即哪種RAID級別最適合我們的設置。

Cassandra使用Astyanax客戶端讀取性能

問題描述

1 個解決方案

解決方案1
0 2013-05-11 23:56:53

腹膜炎

卡桑德拉

Cassandra使用Astyanax客戶端讀取性能

問題描述

1 個解決方案

解決方案1 0 2013-05-11 23:56:53

腹膜炎

卡桑德拉

解決方案1
0 2013-05-11 23:56:53