简体   繁体   中英

Elasticsearch: document size and query performance

I have an ES index with medium size documents (15-30 Mb more or less).

Each document has a boolean field and most of the times users just want to know if a specific document ID has that field set to true.

Will document size affect the performance of this query?

   "size": 1,
   "query": {
      "term": {
         "my_field": True
      }
   },
   "_source": [
      "my_field"
   ]

And will a "size":0 query results in better time performance?

Adding "size":0 to your query, you will avoid some net transfer this behaviour will improve your performance time.

But as I understand your case of use, you can use count

An example query:

curl -XPOST 'http://localhost:9200/test/_count -d '{
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "id": xxxxx
              }
            },
            {
              "term": {
                "bool_field": True
              }
            }
          ]
        }
      }
    }'

With this query only checking if there is some total, you will know if a doc with some id have set the bool field to true/false depending on the value that you specify in bool_field at query. This will be quite fast.

Considering that Elasticsearch will index your fields, the document size will not be a big problem for the performance. Using size 0 don't affect the query performance inside Elasticsearch but affect positively the performance to retrieve the document because the network transfer.

If you just want to check one boolean field for a specific document you can simply use Get API to obtain the document just retrieving the field you want to check, like this:

curl -XGET 'http://localhost:9200/my_index/my_type/1000?fields=my_field'

In this case Elasticsearch will just retrieve the document with _id = 1000 and the field my_field . So you can check the boolean value.

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1000",
  "_version": 9,
  "found": true,
  "fields": {
    "my_field": [
      true
    ]
  }
}

By looking at your question I see that you haven't mentioned the elasticsearch version you are using. I would say there are lot of factors that affects the performance of a elasticsearch cluster.

However assuming it is the latest elasticsearch and considering that you are after a single value, the best approach is to change your query in to a non-scoring, filtering query. Filters are quite fast in elasticsearch and very easily cached. Making a query non-scoring avoids the scoring phase entirely(calculating relevance, etc...).

To to this:

GET localhost:9200/test_index/test_partition/_search
{
"query" : {
    "constant_score" : { 
        "filter" : {
            "term" : { 
                "my_field" : True
            }
        }
    }
}

}

Note that we are using the search API. The constant_score is used to convert the term query in to a filter, which should be inherently fast.

For more information. Please refer Finding exact values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM