简体   繁体   中英

Elasticsearch document count returned by _stats versus _count

I'm trying to get statistics/counts on indices in my elasticsearch cluster (1.2.1). I was using the Indices Stats API (_stats endpoint) to get the total number of primary documents and their size on disk. However, I started experimenting with the Count API (_count endpoint) and noticed that the values do not align.

What is the difference between these values? It's not entirely clear from the documentation though a clue in the documentation indicates that the value returned from Indicies Stats can change when refreshing the index. This makes me wonder if this is a lower-level value from the Lucene layer.

Indices Stats API

localhost:9200/my_index/_stats

...snip...

"_all" : {
  "primaries" : {
    "docs" : {
      "count" : 8284,
      "deleted" : 87
    },
  }
}

...snip...

Count API

localhost:9200/my_index/_count

{
  "count" : 6854,
  "_shards" : {
    "total" : 40,
    "successful" : 40,
    "failed" : 0
  }
}

Actually, the docs.count you get back from the Indices stats API also includes the count of nested documents present in the index so it will always be greater or equals than the count you get back from the Count API, which only returns the count of top-level documents, ie documents that would be returned from a search query.

So, judging by the numbers you posted, it looks like your index contains documents with fields whose type is nested in the mapping. Sounds correct?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM