简体   繁体   English

我应该在ElasticSearch集群中设置多少个机架

[英]How many racks should I set in my ElasticSearch cluster

My ES cluster has 20 machines with 50 nodes(ES instances), I'm not sure how many racks should I set. 我的ES群集有20台机器,有50个节点(ES实例),我不确定应该设置多少个机架。 Is two racks enough? 两个架子够吗? or 3 or 4 racks better. 或3或4个机架更好。

As I know if I set rack_id in ES configuration, it can provide the following functions: 据我所知,如果我在ES配置中设置rack_id,它可以提供以下功能:

1, Select data location or relocation(to make sure replicas in different racks)
2, Use Rack_id as doc routing

Any reasons should I set more racks, but I think even just one rack by default is good too. 任何理由我应该设置更多的机架,但我认为即使只有一个机架默认也是好的。

The chance of an outage of two machines is highest if they share hardware because you use VMs, smaller if they share a rack but not hardware, and again smaller if they share a building but not a rack. 如果因为你使用虚拟机而共享硬件,那么两台机器中断的可能性最大,如果它们共用一个机架而不是硬件则更小,如果它们共用一个建筑而不是一个机架,则再次更小。 So it makes sense to use more than a single rack. 因此,使用多个机架是有意义的。

Whether you need more than 2 racks depends on your replicas. 您是否需要超过2个机架取决于您的副本。 The default number of replications is 1. If you require a higher value, strictly speaking you will degrade the Availability of your cluster a bit if you use only 2 racks because the >= 3 setting will not be effective on the rack level. 默认复制数为1.如果需要更高的值,严格来说,如果仅使用2个机架,则会降低群集的可用性,因为> = 3设置在机架级别上无效。

I think that in your case, it's simpler and easier to just set cluster.routing.allocation.same_shard.host to true . 我认为在您的情况下,将cluster.routing.allocation.same_shard.host设置为true更简单,更容易。 (See https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html ) This will prevent copies of the same shard to be placed on the same host (host is identified by address and host name). (请参阅https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html )这将防止将同一分片的副本放在同一主机上(主机由地址和主机名)。 Please, test this before going in production with this approach. 请在使用此方法进行生产之前对此进行测试。

Also, keep in mind that you need to specify the processors setting ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#processors ) accordingly. 另外,请记住,您需要相应地指定processors设置( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html#processors )。 Each ES node detects the # of cores available on the machine (not aware of other nodes present). 每个ES节点都会检测计算机上可用的核心数(不知道存在其他节点)。 With multiple nodes on the same machine, each node can think that it has dedicated access to all cores on the machine (this can be problematic for the default thread pool sizes are derived from this). 如果同一台机器上有多个节点,则每个节点都可以认为它具有对机器上所有核心的专用访问权限(这可能会导致默认线程池大小从此派生)。 So you will want to explicitly specify the # of cores available via the processors setting so that it does not end up over-allocating the thread pools. 因此,您需要明确指定通过processors设置可用的核心数,以便它不会最终过度分配线程池。

I recommend using dedicated master nodes and to ensure cluster stability, each dedicated master node instance should be on its own machine (certainly can be a much smaller machine, eg 4Gb of RAM to start with). 我建议使用专用主节点并确保集群稳定性,每个专用主节点实例应该在自己的机器上(当然可以是一个小得多的机器,例如4Gb的RAM开始)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM