简体   繁体   English

具有许多要素的术语查询的性能

[英]Performance of Terms Query with many elements

I'm planning to use a Terms Query with many terms (depending on the case up to 40-50k terms) in all my queries. 我计划在我的所有查询中Terms Query带有多个术语(取决于大小写的40-50k术语)的术语查询。

These terms will be fetched from another index using lookup as explained here . 这些条款将使用查找所解释的另一个指标牵强这里 Elasticsearch takes them internally, so at least they won't go through the wire, but the query itself looks quite heavy. Elasticsearch在内部使用它们,因此至少它们不会通过,但是查询本身看起来很繁琐。

I was wondering if the query performance will be fine. 我想知道查询性能是否会很好。 Anyway I'm planning to do a stress test, but not sure if this is going to escalate well. 无论如何,我计划进行压力测试,但不确定是否会逐步升级。 Someone had experience with these kind of queries or knows how Elasticsearch deals with them internally? 有人对这类查询有经验,或者知道Elasticsearch在内部如何处理它们?

Thank you! 谢谢!

Performance after hundreds of terms will degrade fast: https://github.com/elastic/elasticsearch/issues/18829 数百个术语后的性能将快速下降: https//github.com/elastic/elasticsearch/issues/18829

The following is an uber thread that it was originally mentioned in: https://github.com/elastic/elasticsearch/issues/11511#issuecomment-224028056 以下是最初在以下地方提到的uber线程: https : //github.com/elastic/elasticsearch/issues/11511#issuecomment-224028056

ES will search each term individually across your shards, so as more terms are added, it bogs the cluster down. ES会在您的分片中分别搜索每个术语,因此,添加更多术语后,群集就会陷入瘫痪。 As with anything Elasticsearch though, tuning shard amounts (replicas in your case), node counts, and other configuration options might help. 与任何Elasticsearch一样,调整分片数量(在您的情况下为副本数),节点数和其他配置选项可能会有所帮助。 I'd suggest performance testing to know what you're dealing with, but don't expect anything initially. 我建议进行性能测试,以了解您要处理的内容,但是一开始不要期望任何东西。

I opened an issue in the Elasticsearch repo about this matter, and as I feared, even using lookup, if used with many terms, this kind of query gets very slow. 我在Elasticsearch存储库中打开了一个与此相关的问题,而且我担心,即使使用查找,如果与许多术语一起使用,这种查询也会变得非常慢。

Also, I mentioned it in the issue, but I stress tested it and checked it myself: 另外,我在问题中提到了它,但是我进行了压力测试并亲自检查了它:

filtering with around 20 thousand terms make the query quite slow (more than 500ms). 使用大约2万个字词进行过滤会使查询速度非常慢(超过500毫秒)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM