Cassandra读/写性能-高CPU

Question

I have started using Casandra since last few days and here is what I am trying to do. 从最近几天开始，我就开始使用Casandra，这是我正在尝试做的事情。

I have about 2 Million+ objects which maintain profiles of users. 我有大约200万个对象，用于维护用户的个人资料。 I convert these objects to json, compress and store them in a blob column. 我将这些对象转换为json，将其压缩并存储在blob列中。 The average compressed json size is about 10KB. 平均压缩后的json大小约为10KB。 This is how my table looks in cassandra, 这就是我的桌子在cassandra中的外观，

Table: 表：

dev.userprofile (uid varchar primary key, profile blob);

Select Query: select profile from dev.userprofile where uid=''; 选择查询：从dev.userprofile中选择配置文件，其中uid ='';

Update Query: 更新查询：

update dev.userprofile set profile='<bytebuffer>' where uid = '<uid>'

Every hour, I get events from a queue which I apply to my userprofile object. 每小时，我都会从队列中获取事件，该事件将应用于我的userprofile对象。 Each event corresponds to one userprofile object. 每个事件对应一个用户配置文件对象。 I get about 1 Million of such events, so I have to update around 1M of the userprofile objects within a short time ie update the object in my application, compress the json and update the cassandra blob. 我得到了大约一百万个这样的事件，所以我必须在短时间内更新大约1M个用户配置文件对象，即更新我的应用程序中的对象，压缩json和更新cassandra blob。 I have to finish updating all of 1 Million user profile objects preferably in few mins. 我必须在几分钟内完成所有100万个用户配置文件对象的更新。 But I notice its taking longer now. 但是我注意到它花了更长的时间。

While running my application, I notice that I can update around 400 profiles/second on an average. 运行我的应用程序时，我注意到我平均可以每秒更新约400个配置文件。 I already see a lot of CPU iowait - 70%+ on cassandra instance. 我已经看到很多CPU iowait-cassandra实例上超过70％。 Also, the load initially is pretty high around 16 (on 8 vcpu instance) and then drops off to around 4. 同样，负载最初在16（在8个vcpu实例上）很高，然后下降到4。

What am I doing wrong? 我究竟做错了什么？ Because, when I was updating smaller objects of size 2KB I noticed that cassandra operations /sec is much faster. 因为，当我更新大小为2KB的较小对象时，我注意到cassandra操作/ sec快得多。 I was able to get about 3000 Ops/sec. 我能够获得约3000次操作/秒。 Any thoughts on how I should improve the performance? 关于如何改善性能有任何想法吗？

<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-core</artifactId>
  <version>3.1.0</version>
</dependency>
<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-extras</artifactId>
  <version>3.1.0</version>
</dependency>

I just have one node of cassandra setup within a m4.2xlarge aws instance for testing 我只是在m4.2xlarge aws实例中有一个Cassandra设置节点用于测试

Single node Cassandra instance
m4.2xlarge aws ec2
500 GB General Purpose (SSD) 
IOPS - 1500 / 10000

nodetool cfstats output nodetool cfstats输出

Keyspace: dev
    Read Count: 688795
    Read Latency: 27.280683695439137 ms.
    Write Count: 688780
    Write Latency: 0.010008401811899301 ms.
    Pending Flushes: 0
        Table: userprofile
        SSTable count: 9
        Space used (live): 32.16 GB
        Space used (total): 32.16 GB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 13.56 MB
        SSTable Compression Ratio: 0.9984539538554672
        Number of keys (estimate): 2215817
        Memtable cell count: 38686
        Memtable data size: 105.72 MB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 6
        Local read count: 688807
        Local read latency: 29.879 ms
        Local write count: 688790
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 47
        Bloom filter false ratio: 0.00003
        Bloom filter space used: 7.5 MB
        Bloom filter off heap memory used: 7.5 MB
        Index summary off heap memory used: 2.07 MB
        Compression metadata off heap memory used: 3.99 MB
        Compacted partition minimum bytes: 216 bytes
        Compacted partition maximum bytes: 370.14 KB
        Compacted partition mean bytes: 5.82 KB
        Average live cells per slice (last five minutes): 1.0
        Maximum live cells per slice (last five minutes): 1
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1

nodetool cfhistograms output nodetool cfhistograms输出

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             3.00              9.89           2816.16              4768                 2
75%             3.00             11.86          43388.63              8239                 2
95%             4.00             14.24         129557.75             14237                 2
98%             4.00             20.50         155469.30             17084                 2
99%             4.00             29.52         186563.16             20501                 2
Min             0.00              1.92             61.22               216                 2
Max             5.00          74975.55        4139110.98            379022                 2

Dstat output Dstat输出

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
12.8 13.9 10.6|1460  31.1 |1.0  14 0.2|9.98G  892k 21.2G  234M|   0     0 | 119M 3291k|  63k   68k|  1   1  26  72   0   0|3366k 3338k
13.2 14.0 10.7|1458  28.4 |1.1  13 1.5|9.97G  884k 21.2G  226M|   0     0 | 119M 3278k|  61k   68k|  2   1  28  69   0   0|3396k 3349k
12.7 13.8 10.7|1477  27.6 |0.9  11 1.1|9.97G  884k 21.2G  237M|   0     0 | 119M 3321k|  69k   72k|  2   1  31  65   0   0|3653k 3605k
12.0 13.7 10.7|1474  27.4 |1.1 8.7 0.3|9.96G  888k 21.2G  236M|   0     0 | 119M 3287k|  71k   75k|  2   1  36  61   0   0|3807k 3768k
11.8 13.6 10.7|1492  53.7 |1.6  12 1.2|9.95G  884k 21.2G  228M|   0     0 | 119M 6574k|  73k   75k|  2   2  32  65   0   0|3888k 3829k

Edit 编辑

Switched to LeveledCompactionStrategy & disabled compression on sstables, I don't see a big improvement: 切换到LeveledCompactionStrategy并在sstables上禁用了压缩，我看不出有什么大的改进：

There was a bit of improvement in profiles/sec updated. 配置文件/秒更新有一些改进。 Its now 550-600 profiles /sec. 现在是550-600个配置文件/秒。 But, the cpu spikes remain ie the iowait. 但是，CPU峰值仍然存在，即iowait。

gcstats gcstats

   Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
          755960                  83                3449                   8         73179796264                 107                       -1

dstats 统计

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
7.02 8.34 7.33| 220  16.6 |0.0   0 1.1|10.0G  756k 21.2G  246M|   0     0 |  13M 1862k|  11k   13k|  1   0  94   5   0   0|   0     0
6.18 8.12 7.27|2674  29.7 |1.2 1.5 1.9|10.0G  760k 21.2G  210M|   0     0 | 119M 3275k|  69k   70k|  3   2  83  12   0   0|3906k 3894k
5.89 8.00 7.24|2455   314 |0.6 5.7   0|10.0G  760k 21.2G  225M|   0     0 | 111M   39M|  68k   69k|  3   2  51  44   0   0|3555k 3528k
5.21 7.78 7.18|2864  27.2 |2.6 3.2 1.4|10.0G  756k 21.2G  266M|   0     0 | 127M 3284k|  80k   76k|  3   2  57  38   0   0|4247k 4224k
4.80 7.61 7.13|2485   288 |0.1  12 1.4|10.0G  756k 21.2G  235M|   0     0 | 113M   36M|  73k   73k|  2   2  36  59   0   0|3664k 3646k
5.00 7.55 7.12|2576  30.5 |1.0 4.6   0|10.0G  760k 21.2G  239M|   0     0 | 125M 3297k|  71k   70k|  2   1  53  43   0   0|3884k 3849k
5.64 7.64 7.15|1873   174 |0.9  13 1.6|10.0G  752k 21.2G  237M|   0     0 | 119M   21M|  62k   66k|  3   1  27  69   0   0|3107k 3081k

You could notice the cpu spikes. 您可能会注意到CPU峰值。

My main concern is iowait before I increase the load further. 我主要关心的是iowait，然后再进一步增加负载。 Anything specific I should looking for thats causing this? 有什么我应该寻找的具体原因吗？ Because, 600 profiles / sec (ie 600 Reads + Writes) seems low to me. 因为对我而言，600个配置文件/秒（即600次读写）似乎很低。

Answer 1

Can you try LeveledCompactionStrategy? 您可以尝试LeveledCompactionStrategy吗？ With 1:1 read/writes on large objects like this the IO saved on reads will probably counter the IO spent on the more expensive compactions. 对大型对象进行1：1读/写时，读取时保存的IO可能会抵消花费在更昂贵的压缩上的IO。

If your already compressing the data before sending it, you should turn off compression on the table. 如果在发送之前已经压缩了数据，则应在表上关闭压缩。 Its breaking it into 64kb chunks which will be largely dominated by only 6 values which wont get much compression (as shown in horrible compression ratio SSTable Compression Ratio: 0.9984539538554672 ). 它将其分解为64kb的块，这在很大程度上将仅由6个值控制，而不会获得太多压缩（如可怕的压缩率SSTable Compression Ratio: 0.9984539538554672 ）。

ALTER TABLE dev.userprofile
  WITH compaction = { 'class' : 'LeveledCompactionStrategy'  }
  AND compression = { 'sstable_compression' : '' };

400 profiles/second is very very slow though and there may be some work to do on your client that could potentially be bottleneck as well. 但是，每秒400个配置文件的速度非常非常慢，并且可能需要在客户端上做一些工作，这些工作也可能成为瓶颈。 If you have a 4 load on a 8 core system its may not Cassandra slowing things down. 如果您在8核系统上负载4，则Cassandra可能不会减慢速度。 Make sure your parallelizing your requests and using them asynchronously, sending requests sequentially is a common issue. 确保并行化请求并异步使用它们，顺序发送请求是一个常见问题。

With larger blobs there is going to be an impact on GCs, so monitoring them and adding that information can be helpful. 对于较大的Blob，将对GC产生影响，因此监视它们并添加该信息可能会有所帮助。 I would be surprised for 10kb objects to affect it that much but its something to look out for and may require more JVM tuning. 10kb对象会对它产生如此之大的影响，我会感到惊讶，但是要注意它，并且可能需要更多的JVM调整。

If that helps, from there I would recommend tuning the heap and upgrading to at least 3.7 or latest in 3.0 line. 如果有帮助，我建议从那里开始调整堆并升级到至少3.7或最新的3.0行。

Cassandra读/写性能-高CPU

问题描述

1 个解决方案

解决方案1
1 2016-10-31 02:57:33

Cassandra读/写性能-高CPU

问题描述

1 个解决方案

解决方案1 1 2016-10-31 02:57:33

解决方案1
1 2016-10-31 02:57:33