简体   繁体   中英

ElasticSearch + Kibana - Unique count using pre-computed hashes

update : Added

I want to perform unique count on my ElasticSearch cluster. The cluster contains about 50 millions of records.

I've tried the following methods:

First method

Mentioned in this section :

Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory.

Second method

Mentioned in this section :

Unless you configure Elasticsearch to use doc_values as the field data format, the use of aggregations and facets is very demanding on heap space.

My property mapping

"my_prop": {
  "index": "not_analyzed",
  "fielddata": {
    "format": "doc_values"
  },
  "doc_values": true,
  "type": "string",
  "fields": {
    "hash": {
      "type": "murmur3"
    }
  }
}

The problem

When I use unique count on my_prop.hash in Kibana I receive the following error:

Data too large, data for [my_prop.hash] would be larger than limit

ElasticSearch has 2g heap size. The above also fails for a single index with 4 millions of records.

My questions

  1. Am I missing something in my configurations?
  2. Should I increase my machine? This does not seem to be the scalable solution.

ElasticSearch query

Was generated by Kibana: http://pastebin.com/hf1yNLhE

ElasticSearch Stack trace

http://pastebin.com/BFTYUsVg

That error says you don't have enough memory (more specifically, memory for fielddata ) to store all the values from hash , so you need to take them out from the heap and put them on disk, meaning using doc_values .

Since you are already using doc_values for my_prop I suggest doing the same for my_prop.hash (and, no, the settings from the main field are not inherited by the sub-fields): "hash": { "type": "murmur3", "index" : "no", "doc_values" : true } .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM