简体   繁体   English

使用过滤器,查询或两者结合来优化Elasticsearch查询

[英]optimize elasticsearch query using filter, query or mix of both

I am trying to improve the performance of a elasticsearch query. 我正在尝试提高Elasticsearch查询的性能。 The goal o the query is just retrieve those document that match the query, so score does not matter, also is important to mention we got an index per day, so the quer. 查询的目的只是检索与查询匹配的那些文档,因此得分无所谓,提到我们每天都有一个索引也很重要,因此很奇怪。 As far as I know for this cases is better to use filter, avoiding to calculate scoring, but also I just red that there is/are some alternative using finter inside query retrieving all document score 1, so The first query I made was the followig: 据我所知,在这种情况下最好使用过滤器,避免计算得分,但是我只是红色表示在查询中使用finter检索所有文档得分为1有一些替代方法,所以我进行的第一个查询是followig :

{
 "filter": {
  "bool": {
   "must": [{
     "match": {
      "from": "john.doe@example.com"
     }
    }, {
     "range": {
      "receivedDate": {
       "gte": "date1",
       "lte": "date2"
      }
     }
    }
   ]
  }
 }
}

Then I made my first test and I change "filter" for "query" and most of the time I get better times using "query" then "filter", that is my first question, why? 然后,我进行了第一个测试,并更改了“查询”的“过滤器”,并且在大多数情况下,我使用“查询”而不是“过滤器”获得更好的成绩,这是我的第一个问题,为什么? What I have doing wrong on my query to have filter slower than a query? 我在查询中做错了什么以使筛选器比查询慢?

After than I keep reading trying to improve it and I got this: 之后,我继续阅读以尝试改进它,我得到了:

{
    "query": {
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [{
                            "match": {
                                "from": "john.doe@example.com"
                            }
                        }, {
                            "range": {
                                "receivedDate": {
                                    "gte": "date1",
                                    "lte": "date2"
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

With the latter I have the impression have been improved a little bit. 对于后者,我的印象有所改善。 So according with your experience could you tell me which one is better (at least in theory) to have a faster result, also Exist the chance that using one of this queries cache the results improving the queries made forward. 因此,根据您的经验,您能否告诉我哪一个更好(至少在理论上)可以获得更快的结果,还存在使用该查询之一缓存结果以改善提出的查询的机会。 There is a better way to make this query? 有没有更好的方法来进行此查询? Thanks in advance for your help. 在此先感谢您的帮助。 I forgot to mention I am using Elasticsearch v2.3 我忘了提我正在使用Elasticsearch v2.3

In your first query, you were only using a post_filter . 在第一个查询中,您仅使用post_filter Your second query is the way to go, but it can be optimized to this (no need to wrap bool/filter inside bool/must ): 第二个查询是解决方法,但可以对此进行优化(无需将bool/filter包裹在bool/must ):

{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "receivedDate": {
              "gte": "date1",
              "lte": "date2"
            }
          }
        },
        {
          "term": {
            "from": "john.doe@example.com"
          }
        }
      ]
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM