Cassandra讀/寫性能-高CPU

Question

從最近幾天開始，我就開始使用Casandra，這是我正在嘗試做的事情。

我有大約200萬個對象，用於維護用戶的個人資料。 我將這些對象轉換為json，將其壓縮並存儲在blob列中。 平均壓縮后的json大小約為10KB。 這就是我的桌子在cassandra中的外觀，

表：

dev.userprofile (uid varchar primary key, profile blob);

選擇查詢：從dev.userprofile中選擇配置文件，其中uid ='';

更新查詢：

update dev.userprofile set profile='<bytebuffer>' where uid = '<uid>'

每小時，我都會從隊列中獲取事件，該事件將應用於我的userprofile對象。 每個事件對應一個用戶配置文件對象。 我得到了大約一百萬個這樣的事件，所以我必須在短時間內更新大約1M個用戶配置文件對象，即更新我的應用程序中的對象，壓縮json和更新cassandra blob。 我必須在幾分鍾內完成所有100萬個用戶配置文件對象的更新。 但是我注意到它花了更長的時間。

運行我的應用程序時，我注意到我平均可以每秒更新約400個配置文件。 我已經看到很多CPU iowait-cassandra實例上超過70％。 同樣，負載最初在16（在8個vcpu實例上）很高，然后下降到4。

我究竟做錯了什么？ 因為，當我更新大小為2KB的較小對象時，我注意到cassandra操作/ sec快得多。 我能夠獲得約3000次操作/秒。 關於如何改善性能有任何想法嗎？

<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-core</artifactId>
  <version>3.1.0</version>
</dependency>
<dependency>
  <groupId>com.datastax.cassandra</groupId>
  <artifactId>cassandra-driver-extras</artifactId>
  <version>3.1.0</version>
</dependency>

我只是在m4.2xlarge aws實例中有一個Cassandra設置節點用於測試

Single node Cassandra instance
m4.2xlarge aws ec2
500 GB General Purpose (SSD) 
IOPS - 1500 / 10000

nodetool cfstats輸出

Keyspace: dev
    Read Count: 688795
    Read Latency: 27.280683695439137 ms.
    Write Count: 688780
    Write Latency: 0.010008401811899301 ms.
    Pending Flushes: 0
        Table: userprofile
        SSTable count: 9
        Space used (live): 32.16 GB
        Space used (total): 32.16 GB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 13.56 MB
        SSTable Compression Ratio: 0.9984539538554672
        Number of keys (estimate): 2215817
        Memtable cell count: 38686
        Memtable data size: 105.72 MB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 6
        Local read count: 688807
        Local read latency: 29.879 ms
        Local write count: 688790
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 47
        Bloom filter false ratio: 0.00003
        Bloom filter space used: 7.5 MB
        Bloom filter off heap memory used: 7.5 MB
        Index summary off heap memory used: 2.07 MB
        Compression metadata off heap memory used: 3.99 MB
        Compacted partition minimum bytes: 216 bytes
        Compacted partition maximum bytes: 370.14 KB
        Compacted partition mean bytes: 5.82 KB
        Average live cells per slice (last five minutes): 1.0
        Maximum live cells per slice (last five minutes): 1
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1

nodetool cfhistograms輸出

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)
50%             3.00              9.89           2816.16              4768                 2
75%             3.00             11.86          43388.63              8239                 2
95%             4.00             14.24         129557.75             14237                 2
98%             4.00             20.50         155469.30             17084                 2
99%             4.00             29.52         186563.16             20501                 2
Min             0.00              1.92             61.22               216                 2
Max             5.00          74975.55        4139110.98            379022                 2

Dstat輸出

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
12.8 13.9 10.6|1460  31.1 |1.0  14 0.2|9.98G  892k 21.2G  234M|   0     0 | 119M 3291k|  63k   68k|  1   1  26  72   0   0|3366k 3338k
13.2 14.0 10.7|1458  28.4 |1.1  13 1.5|9.97G  884k 21.2G  226M|   0     0 | 119M 3278k|  61k   68k|  2   1  28  69   0   0|3396k 3349k
12.7 13.8 10.7|1477  27.6 |0.9  11 1.1|9.97G  884k 21.2G  237M|   0     0 | 119M 3321k|  69k   72k|  2   1  31  65   0   0|3653k 3605k
12.0 13.7 10.7|1474  27.4 |1.1 8.7 0.3|9.96G  888k 21.2G  236M|   0     0 | 119M 3287k|  71k   75k|  2   1  36  61   0   0|3807k 3768k
11.8 13.6 10.7|1492  53.7 |1.6  12 1.2|9.95G  884k 21.2G  228M|   0     0 | 119M 6574k|  73k   75k|  2   2  32  65   0   0|3888k 3829k

編輯

切換到LeveledCompactionStrategy並在sstables上禁用了壓縮，我看不出有什么大的改進：

配置文件/秒更新有一些改進。 現在是550-600個配置文件/秒。 但是，CPU峰值仍然存在，即iowait。

gcstats

   Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
          755960                  83                3449                   8         73179796264                 107                       -1

統計

---load-avg--- --io/total- ---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- ----total-cpu-usage---- -net/total-
 1m   5m  15m | read  writ|run blk new| used  buff  cach  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq| recv  send
7.02 8.34 7.33| 220  16.6 |0.0   0 1.1|10.0G  756k 21.2G  246M|   0     0 |  13M 1862k|  11k   13k|  1   0  94   5   0   0|   0     0
6.18 8.12 7.27|2674  29.7 |1.2 1.5 1.9|10.0G  760k 21.2G  210M|   0     0 | 119M 3275k|  69k   70k|  3   2  83  12   0   0|3906k 3894k
5.89 8.00 7.24|2455   314 |0.6 5.7   0|10.0G  760k 21.2G  225M|   0     0 | 111M   39M|  68k   69k|  3   2  51  44   0   0|3555k 3528k
5.21 7.78 7.18|2864  27.2 |2.6 3.2 1.4|10.0G  756k 21.2G  266M|   0     0 | 127M 3284k|  80k   76k|  3   2  57  38   0   0|4247k 4224k
4.80 7.61 7.13|2485   288 |0.1  12 1.4|10.0G  756k 21.2G  235M|   0     0 | 113M   36M|  73k   73k|  2   2  36  59   0   0|3664k 3646k
5.00 7.55 7.12|2576  30.5 |1.0 4.6   0|10.0G  760k 21.2G  239M|   0     0 | 125M 3297k|  71k   70k|  2   1  53  43   0   0|3884k 3849k
5.64 7.64 7.15|1873   174 |0.9  13 1.6|10.0G  752k 21.2G  237M|   0     0 | 119M   21M|  62k   66k|  3   1  27  69   0   0|3107k 3081k

您可能會注意到CPU峰值。

我主要關心的是iowait，然后再進一步增加負載。 有什么我應該尋找的具體原因嗎？ 因為對我而言，600個配置文件/秒（即600次讀寫）似乎很低。

Answer 1

您可以嘗試LeveledCompactionStrategy嗎？ 對大型對象進行1：1讀/寫時，讀取時保存的IO可能會抵消花費在更昂貴的壓縮上的IO。

如果在發送之前已經壓縮了數據，則應在表上關閉壓縮。 它將其分解為64kb的塊，這在很大程度上將僅由6個值控制，而不會獲得太多壓縮（如可怕的壓縮率SSTable Compression Ratio: 0.9984539538554672 ）。

ALTER TABLE dev.userprofile
  WITH compaction = { 'class' : 'LeveledCompactionStrategy'  }
  AND compression = { 'sstable_compression' : '' };

但是，每秒400個配置文件的速度非常非常慢，並且可能需要在客戶端上做一些工作，這些工作也可能成為瓶頸。 如果您在8核系統上負載4，則Cassandra可能不會減慢速度。 確保並行化請求並異步使用它們，順序發送請求是一個常見問題。

對於較大的Blob，將對GC產生影響，因此監視它們並添加該信息可能會有所幫助。 10kb對象會對它產生如此之大的影響，我會感到驚訝，但是要注意它，並且可能需要更多的JVM調整。

如果有幫助，我建議從那里開始調整堆並升級到至少3.7或最新的3.0行。

Cassandra讀/寫性能-高CPU

問題描述

1 個解決方案

解決方案1
1 2016-10-31 02:57:33

Cassandra讀/寫性能-高CPU

問題描述

1 個解決方案

解決方案1 1 2016-10-31 02:57:33

解決方案1
1 2016-10-31 02:57:33