简体   繁体   中英

ElasticSearch. Total number of unique terms in an index

Is there a way to access the total number of terms in an index through ES API? I need to estimate the prior probability of a term occurring in the index:

total_term_frequency/total_terms_in_index

I can access ttf but no total number of terms stored in the index.

I think the cardinality aggregation is what you're looking for.

For example:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "term_count": {
         "cardinality": {
            "field": "doc_text"
         }
    }
}
...
{
   "took": 7,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "term_count": {
         "value": 161
      }
   }
}

Here is some code I used to play around with it:

http://sense.qbox.io/gist/d5625c80946f332718b0fa166bba27efd264b76e

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM