简体   繁体   English

ElasticSearch。 索引中唯一词的总数

[英]ElasticSearch. Total number of unique terms in an index

Is there a way to access the total number of terms in an index through ES API? 是否可以通过ES API访问索引中的术语总数? I need to estimate the prior probability of a term occurring in the index: 我需要估计一个术语在索引中出现的先验概率:

total_term_frequency/total_terms_in_index

I can access ttf but no total number of terms stored in the index. 我可以访问ttf但是索引中没有存储的术语总数。

I think the cardinality aggregation is what you're looking for. 我认为基数汇总是您想要的。

For example: 例如:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "term_count": {
         "cardinality": {
            "field": "doc_text"
         }
    }
}
...
{
   "took": 7,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "term_count": {
         "value": 161
      }
   }
}

Here is some code I used to play around with it: 这是我以前玩过的一些代码:

http://sense.qbox.io/gist/d5625c80946f332718b0fa166bba27efd264b76e http://sense.qbox.io/gist/d5625c80946f332718b0fa166bba27efd264b76e

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM