简体   繁体   English

使用过多的术语元素优化 ES 查询

[英]Optimize ES query with too many terms elements

We are processing a dataset of billions of records, currently all of the data are saved in ElasticSearch, and all of the queries and aggregations are performed with ElasticSearch.我们正在处理一个数十亿记录的数据集,目前所有的数据都保存在 ElasticSearch 中,所有的查询和聚合都是用 ElasticSearch 进行的。

The simplified query body is like below, we put the device ids in terms and then concate them with should to avoid the limit of 1024 to terms , the total count of terms element is up to 100,000, and it now becomes very slow.简化的查询体如下,我们把设备id放在terms中,然后用should拼接,避免1024个terms的限制,terms元素的总数达到100,000,现在变得很慢。

{
"_source": {
    "excludes": [
        "raw_msg"
    ]
},
"query": {
        "filter": {
            "bool": {
                "must": [
                    {
                        "range": {
                            "create_ms": {
                                "gte": 1664985600000,
                                "lte": 1665071999999
                            }
                        }
                    }
                ],
                "should": [
                    {
                        "terms": {
                            "device_id": [
                                "1328871",
                                "1328899",
                                "1328898",
                                "1328934",
                                "1328919",
                                "1328976",
                                "1328977",
                                "1328879",
                                "1328910",
                                "1328902",
                                ...       # more values, since terms not support values more than 1024, wen concate all of them with should
                            ]
                        }
                    },
                    {
                        "terms": {
                            "device_id": [
                                "1428871",
                                "1428899",
                                "1428898",
                                "1428934",
                                "1428919",
                                "1428976",
                                "1428977",
                                "1428879",
                                "1428910",
                                "1428902",
                                ...
                            ]
                        }
                    },
                    ...  # concate more terms until all of the 100,000 values are included
                ],
                "minimum_should_match": 1
            }
        }
},
"aggs": {
    "create_ms": {
        "date_histogram": {
            "field": "create_ms",
            "interval": "hour",
        }
    }
},
"size": 0}

My question is that is there a way to optimize this case?我的问题是有没有办法优化这个案例? Or is there a better choice to do this kind of search?还是有更好的选择来进行这种搜索?

Realtime or near realtime is a must, other engine is acceptable.实时或接近实时是必须的,其他引擎也是可以接受的。

simplified schema of the data:数据的简化模式:

    "id" : {
        "type" : "long"
    },
    "content" : {
        "type" : "text"
    },
    "device_id" : {
        "type" : "keyword"
    },
    "create_ms" : {
        "type" : "date"
    },
    ... # more field

You can use the terms query with a terms lookup to specify a larger list of values like here您可以使用带有术语查找的术语查询来指定更大的值列表,如下所示

Store your ids in a specific document with id like 'device_ids'将您的 ID 存储在特定文档中,ID 如“device_ids”

"should": [
  {
    "terms": {
      "device_id": {
        "index": "your-index-name",
        "id": "device_ids",
        "path": "field-name"
      }
    }
  }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 具有许多要素的术语查询的性能 - Performance of Terms Query with many elements 如何匹配多个术语查询中的一个? - How to match one from many terms query? ES 查询匹配查询中尽可能多的单词 - ES query to match as many words from the query 查询太多滚动上下文 - Query on too many scroll contexts ES查询匹配数组中的所有元素 - ES query to match all elements in array 为什么这个查询会导致“太多子句”? - Why does this query cause 'too many clauses'? 在 ElasticSearch 中,有没有办法获取 ES 为 term 查询请求返回的每个匹配项的匹配项数? - In ElasticSearch, is there a way to get the number of terms matched for each match returned by ES for a term query request? ElasticSearch too_many_nested_clauses 查询包含太多嵌套子句; maxClauseCount 设置为 1024 - ElasticSearch too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024 我们可以在span_first查询中根据存储在ES中的实际字符串来指定“结束”参数,还是必须根据存储在ES中的令牌来指定 - In span_first query can we specify “end” paramter based on actual string that is stored in ES or do i have to specify in terms of tokens stored in ES 如何处理弹性搜索查询的太多不相关结果? - How to deal with too many irrelevant results for a elasticsearch query?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM