Elasticsearch Memory Issue - ES Process Consuming ALL RAM

Question

We are having an issue on our production Elasticsearch cluster where Elasticsearch seems to be consuming, over time, all of the RAM on each server. Each box has 128GB of RAM so we run two instances, 30GB is allocated to each for the JVM Heap. The remaing 68G is left for the OS and Lucene. We rebooted each of the servers last week and the RAM was started off just right using 24% of the RAM for each Elasticsearch process. It's now been almost a week and our memory consumption has gone up to around 40% per Elasticsearch instance. I have attached our config file in hopes that someone may be able to help figure out why Elasticsearch is growing out past the limit we have set for memory utilization.

Currently we are running ES 1.3.2 but will be upgrading to 1.4.2 next week with our next release.

Here is a view of top (extra fields removed for clarity) from right after the reboot:

  PID USER %MEM TIME+ 2178 elastics 24.1 1:03.49 2197 elastics 24.3 1:07.32

and one today:

  PID USER %MEM TIME+ 2178 elastics 40.5 2927:50 2197 elastics 40.1 3000:44

elasticserach-0.yml:

 cluster.name: PROD node.name: "PROD6-0" node.master: true node.data: true node.rack: PROD6 cluster.routing.allocation.awareness.force.rack.values: PROD4,PROD5,PROD6,PROD7,PROD8,PROD9,PROD10,PROD11,PROD12 cluster.routing.allocation.awareness.attributes: rack node.max_local_storage_nodes: 2 path.data: /es_data1 path.logs:/var/log/elasticsearch bootstrap.mlockall: true transport.tcp.port:9300 http.port: 9200 http.max_content_length: 400mb gateway.recover_after_nodes: 17 gateway.recover_after_time: 1m gateway.expected_nodes: 18 cluster.routing.allocation.node_concurrent_recoveries: 20 indices.recovery.max_bytes_per_sec: 200mb discovery.zen.minimum_master_nodes: 10 discovery.zen.ping.timeout: 3s discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: XXX index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.debug: 2s index.search.slowlog.threshold.fetch.warn: 1s index.search.slowlog.threshold.fetch.info: 800ms index.search.slowlog.threshold.fetch.debug: 500ms index.indexing.slowlog.threshold.index.warn: 10s index.indexing.slowlog.threshold.index.info: 5s index.indexing.slowlog.threshold.index.debug: 2s monitor.jvm.gc.young.warn: 1000ms monitor.jvm.gc.young.info: 700ms monitor.jvm.gc.young.debug: 400ms monitor.jvm.gc.old.warn: 10s monitor.jvm.gc.old.info: 5s monitor.jvm.gc.old.debug: 2s action.auto_create_index: .marvel-* action.disable_delete_all_indices: true indices.cache.filter.size: 10% index.refresh_interval: -1 threadpool.search.type: fixed threadpool.search.size: 48 threadpool.search.queue_size: 10000000 cluster.routing.allocation.cluster_concurrent_rebalance: 6 indices.store.throttle.type: none index.reclaim_deletes_weight: 4.0 index.merge.policy.max_merge_at_once: 5 index.merge.policy.segments_per_tier: 5 marvel.agent.exporter.es.hosts: ["1.1.1.1:9200","1.1.1.1:9200"] marvel.agent.enabled: true marvel.agent.interval: 30s script.disable_dynamic: false

and here is /etc/sysconfig/elasticsearch-0 :

 # Directory where the Elasticsearch binary distribution resides ES_HOME=/usr/share/elasticsearch # Heap Size (defaults to 256m min, 1g max) ES_HEAP_SIZE=30g # Heap new generation #ES_HEAP_NEWSIZE= # max direct memory #ES_DIRECT_SIZE= # Additional Java OPTS #ES_JAVA_OPTS= # Maximum number of open files MAX_OPEN_FILES=65535 # Maximum amount of locked memory MAX_LOCKED_MEMORY=unlimited # Maximum number of VMA (Virtual Memory Areas) a process can own MAX_MAP_COUNT=262144 # Elasticsearch log directory LOG_DIR=/var/log/elasticsearch # Elasticsearch data directory DATA_DIR=/es_data1 # Elasticsearch work directory WORK_DIR=/tmp/elasticsearch # Elasticsearch conf directory CONF_DIR=/etc/elasticsearch # Elasticsearch configuration file (elasticsearch.yml) CONF_FILE=/etc/elasticsearch/elasticsearch-0.yml # User to run as, change this to a specific elasticsearch user if possible # Also make sure, this user can write into the log directories in case you change them # This setting only works for the init script, but has to be configured separately for systemd startup ES_USER=elasticsearch # Configure restart on package upgrade (true, every other setting will lead to not restarting) #RESTART_ON_UPGRADE=true

Please let me know if there is any other data I can provide. Thanks in advance for any help.

  total used free shared buffers cached Mem: 129022 119372 9650 0 219 46819 -/+ buffers/cache: 72333 56689 Swap: 28603 0 28603

Answer 1

What you are seeing isn't heap blow out, heap will always be restricted by what you set in the config. free -m and top report on OS related use, so the use there would most likely be the OS caching FS calls.

This will not cause a java OOM.

If you are experiencing java OOM, which is directly related to the java heap running out of space, then there is something else at play. Your logs may provide some info around that.

Elasticsearch Memory Issue - ES Process Consuming ALL RAM

Question

1 answers

solution1
0 2015-02-12 01:06:20

Elasticsearch Memory Issue - ES Process Consuming ALL RAM

Question

1 answers

solution1 0 2015-02-12 01:06:20

solution1
0 2015-02-12 01:06:20