简体   繁体   English

Cassandra-JVM OOM直接缓冲区错误

[英]Cassandra - JVM OOM direct buffer errors

We have a Datastax Enterprise cluster of the following configuration: 我们具有以下配置的Datastax Enterprise集群:

java version "1.8.0_181"
DataStax Enterprise Version: 6.0.0

Number of Nodes: 3
Node Listing:
Name: localhost - xx.xx.xx.01
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 5 GB / 125 GB
Name: localhost - xx.xx.xx.02
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 5 GB / 125 GB
Name: localhost - xx.xx.xx.03
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 6 GB / 125 GB

Keyspace size - 1.34 GB

Yesterday, We'd a lot of OOM Errors on 1 of 3 nodes and similar errors came on the other nodes after restarting the first one. 昨天,我们在3个节点中的1个发生了很多OOM错误,并且在重新启动第一个节点之后,其他节点也出现了类似的错误。 Error details: 错误详情:

ERROR [CompactionExecutor:4477] 2018-08-29 13:23:00,320  JVMStabilityInspector.java:117 - OutOfMemory error letting the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_181]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_181]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_181]
        at org.apache.cassandra.io.compress.BufferType$2.allocate(BufferType.java:39) ~[dse-db-all-4.0.0.2284.jar:6.0.0]
        at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:89) ~[dse-db-all-4.0.0.2284.jar:6.0.0]
        at org.apache.cassandra.io.sstable.format.trieindex.TrieIndexSSTableWriter.<init>(TrieIndexSSTableWriter.java:100) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.io.sstable.format.trieindex.TrieIndexFormat$WriterFactory.open(TrieIndexFormat.java:110) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:108) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.switchCompactionLocation(DefaultCompactionWriter.java:71) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.maybeSwitchWriter(CompactionAwareWriter.java:182) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:144) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:210) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:92) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:101) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:310) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_181]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_181]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_181]

This error seems related with Java.nio Direct Buffer Cache which have no upper bound limit and keeps growing until OOM incident occurs. 此错误似乎与Java.nio直接缓冲区高速缓存有关,后者没有上限,并且一直增长直到发生OOM事件。 ( https://support.datastax.com/hc/en-us/articles/360000863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache ). https://support.datastax.com/hc/en-us/articles/360000863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache )。 We're seeing continuously increase in memory utilization on all of our Cassandra nodes. 我们看到所有Cassandra节点上的内存利用率都在不断提高。 This behaviour is still persist even after restarting Cassandrandra nodes. 即使重新启动Cassandrandra节点后,此行为仍然存在。

JVM Config: JVM配置:

-XX:+AlwaysPreTouch
-Dcassandra.disable_auth_caches_remote_configuration=false
-Dcassandra.expiration_date_overflow_policy="REJECT"
-Dcassandra.force_default_indexing_page_size=false
-Dcassandra.join_ring=true
-Dcassandra.load_ring_state=true
-Dcassandra.write_survey=false
#-XX:ConcGCThreads=
-ea
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:GCLogFileSize=10M
-XX:+HeapDumpOnOutOfMemoryError
#-Xmsauto
#-XX:InitiatingHeapOccupancyPercent=
-Dio.netty.eventLoop.maxPendingTasks=65536
-Djava.net.preferIPv4Stack=true
-XX:MaxGCPauseMillis=500
#-Xmxauto
-XX:NumberOfGCLogFiles=10
-Dsun.nio.PageAlignDirectMemory=true
#-XX:ParallelGCThreads=
-Xss256k
-XX:+PerfDisableSharedMem
-XX:+PreserveFramePointer
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-Dcassandra.printHeapHistogramOnOutOfMemoryError=false
-XX:+PrintPromotionFailure
-XX:+PrintTenuringDistribution
-XX:+ResizeTLAB
-XX:-RestrictContended
-XX:StringTableSize=1000003
-XX:ThreadPriorityPolicy=42
-XX:+UnlockDiagnosticVMOptions
-XX:+UseGCLogFileRotation
-XX:+UseThreadPriorities
-XX:+UseTLAB





-XX:+UseG1GC


JVM_ON_OUT_OF_MEMORY_ERROR_OPT="-XX:OnOutOfMemoryError=kill -9 %p"



-Dcom.sun.management.jmxremote.authenticate=false
-Dcassandra.jmx.local.port=7199

Can you give more info about the heap info when the oom come? 当oom出现时,您能否提供有关堆信息的更多信息? old gen, eden, s1 and so on. 老一代,伊甸园,s1等。

  1. check your JVM arguments, for example:-XX:+DisableExplicitGC, if this args is set to false, then System.gc() will do nothing. 检查您的JVM参数,例如:-XX:+ DisableExplicitGC,如果此args设置为false,则System.gc()将不执行任何操作。

  2. Is your jvm has fgc? 您的jvm有fgc吗? check your code for that where DirectByteBuffer object is allocated. 检查您的代码中是否分配了DirectByteBuffer对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM