简体   繁体   English

Elasticsearch提高了查询性能

[英]Elasticsearch improve query performance

I'm trying to improve query performance. 我正在尝试提高查询性能。 It takes an average of about 3 seconds for simple queries which don't even touch a nested document, and it's sometimes longer. 对于甚至没有触及嵌套文档的简单查询,平均需要大约3秒,并且有时更长。

curl "http://searchbox:9200/global/user/_search?n=0&sort=influence:asc&q=user.name:Bill%20Smith"

Even without the sort it takes seconds. 即使没有那种,也需要几秒钟。 Here are the details of the cluster: 以下是群集的详细信息:

1.4TB index size.
210m documents that aren't nested (About 10kb each)
500m documents in total. (nested documents are small: 2-5 fields).
About 128 segments per node.
3 nodes, m2.4xlarge (-Xmx set to 40g, machine memory is 60g)
3 shards.
Index is on amazon EBS volumes.
Replication 0 (have tried replication 2 with only little improvement)

I don't see any noticeable spikes in CPU/memory etc. Any ideas how this could be improved? 我没有看到任何明显的CPU /内存等峰值。任何想法如何改进?

Garry's points about heap space are true, but it's probably not heap space that's the issue here. Garry关于堆空间的观点是正确的,但这可能不是堆空间问题。

With your current configuration, you'll have less than 60GB of page cache available, for a 1.5 TB index. 使用当前配置,对于1.5 TB索引,您可以使用少于60 GB的页面缓存。 With less than 4.2% of your index in page cache, there's a high probability you'll be needing to hit disk for most of your searches. 如果页面缓存中的索引少于4.2%,那么在大多数搜索中,您很可能需要访问磁盘。

You probably want to add more memory to your cluster, and you'll want to think carefully about the number of shards as well. 您可能希望为群集添加更多内存,并且您还需要仔细考虑分片数量。 Just sticking to the default can cause skewed distribution. 坚持默认会导致分布不均匀。 If you had five shards in this case, you'd have two machines with 40% of the data each, and a third with just 20%. 如果你在这种情况下有五个分片,你就有两台机器,每台40%的数据,第三台机器只有20%。 In either case, you'll always be waiting for the slowest machine or disk when doing distributed searches. 在任何一种情况下,在进行分布式搜索时,您将始终等待最慢的机器或磁盘。 This article on Elasticsearch in Production goes a bit more in depth on determining the right amount of memory. 关于Elasticsearch in Production的这篇文章在确定适当的内存量方面有了更多的深入。

For this exact search example, you can probably use filters, though. 对于这个确切的搜索示例,您可以使用过滤器。 You're sorting, thus ignoring the score calculated by the query. 您正在排序,因此忽略查询计算的分数。 With a filter, it'll be cached after the first run, and subsequent searches will be quick. 使用过滤器,它将在第一次运行后进行缓存,后续搜索将很快。

Ok, a few things here: 好的,这里有几件事:

  1. Decrease your heap size, you have a heap size of over 32gb dedicated to each Elasticsearch instance on each platform. 减小堆大小,每个平台上的每个Elasticsearch实例的堆大小超过32gb。 Java doesn't compress pointers over 32gb. Java不会压缩超过32GB的指针。 Drop your nodes to only 32gb and, if you need to, spin up another instance. 将节点丢弃到仅32gb,如果需要,可以启动另一个实例。
  2. If spinning up another instance instance isn't an option and 32gb on 3 nodes isn't enough to run ES then you'll have to bump your heap memory to somewhere over 48gb! 如果启动另一个实例实例不是一个选项,并且3个节点上的32gb不足以运行ES那么你将不得不将你的堆内存提升到48gb以上!
  3. I would probably stick with the default settings for shards and replicas. 我可能会坚持使用分片和副本的默认设置。 5 shards, 1 replica. 5个碎片,1个复制品。 However, you can tweak the shard settings to suit. 但是,您可以调整分片设置以适应。 What I would do is reindex the data in several indices under several different conditions. 我要做的是在几个不同的条件下重新索引几个指数中的数据。 The first index would only have 1 shard, the second index would have 2 shards, I'd do this all the way up to 10 shards. 第一个索引只有1个分片,第二个索引有2个分片,我一直这样做最多10个分片。 Query each index and see which performs best. 查询每个索引并查看哪个表现最佳。 If the 10 shard index is the best performing one keep increasing the shard count until you get worse performance, then you've hit your shard limit. 如果10个分片索引是性能最佳的分片,则继续增加分片计数,直到性能变差,然后达到分片限制。

One thing to think about though, sharding might increase search performance but it also has a massive effect on index time. 但要考虑的一件事是,分片可能会提高搜索性能,但它也会对索引时间产生巨大影响。 The more shards the longer it takes to index a document... 分片越多,索引文档所需的时间越长......

You also have quite a bit of data stored, maybe you should look at Custom Routing too. 您也存储了相当多的数据,也许您应该查看自定义路由

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM