简体繁体中英

Throughput vs replication factor on the read performance of cassandra

原文 2014-12-09 18:16:59 1 1 java/ r/ amazon-ec2/ replication/ cassandra-2.0

I have a cluster of 8 Cassandra nodes(Amazon EC2 instances). I'm carrying out an evaluation on the effect of increasing the replication factor on the read performance of Cassandra. No writes are performed except the initial inserts of 1 million objects. Read_Repair chance is disabled and am using a consistency level of ONE. My observation so far is that as the replication factor increases the read performance decreases. Any explanations as to why this is happening?

1 answers

Depending on what kind of read you are trying to do, the read performance can decrease if the number of nodes remains the same and you increase the replication factor.

For example, if you run range queries on clustering columns, or any other query that require specifying the "allow filtering" keyword, you can observe that behaviour in theory. By increasing the replication factor, every node of the cluster will store more data: the data related to the primary range of the ring and the data related to all the partition keys for which the node is a replica. Even if Cassandra has many optimization for avoiding the degradation of performance for such queries, adding more rows in each node will produce lower performance.

For queries that use the partition key, the degradation of performance should not be observable, since there will be almost the same number of accesses to partition summary (in memory) and partition index (on disk) before reaching the data. This holds, obviously, only if you do consistency-one reads. If you observe this phenomenon in this case, I think it should be related to an increased number of cache miss (if you use key-cache, row-cache or bloom-filters, especially when you try to read non-existent data), since all these caches cannot hold all the data that is present on disk, and since now you have more data on each node, the number of hits in all caches should decrease. This can be verified using nodetool.

Of course, in case of partition-key access you have many other advantages in increasing the replication factor, since you have more replica nodes available for answering your queries. But, since your driver has more choices with higher replication factors, the probability to ask a row twice to the same node decreases. Then you have less probability of finding the row in some cache.

Cassandra read performance almost a constant with replication

Cassandra replication factor greater than number of nodes

Cassandra Read/Get Performance

Regarding Cassandra Read Performance

Cassandra - SimpleStrategy requires a replication_factor strategy option

Cassandra read performance with Astyanax client

Cassandra batch query vs single insert performance

Cassandra Read/Write performance - High CPU

Postgresql Replication solutions and their performance

Direct ByteBuffer relative vs absolute read performance

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cassandra read performance almost a constant with replication Cassandra replication factor greater than number of nodes Cassandra Read/Get Performance Regarding Cassandra Read Performance Cassandra - SimpleStrategy requires a replication_factor strategy option Cassandra read performance with Astyanax client Cassandra batch query vs single insert performance Cassandra Read/Write performance - High CPU Postgresql Replication solutions and their performance Direct ByteBuffer relative vs absolute read performance

Related Tags

Throughput vs replication factor on the read performance of cassandra

Question

1 answers

solution1 1 2014-12-09 18:36:44

solution1
1 2014-12-09 18:36:44