简体繁体 English

每天约200G日志的Elasticsearch集群设计

[英]Elasticsearch cluster design for ~200G logs a day

原文 2017-08-22 08:05:39 6 1 elasticsearch/ logging/ lucene/ kibana/ nosql

I've created ES cluster (version 5.4.1) with 4 data nodes, 3 master, one client node (kibana). 我创建了具有4个数据节点，3个主节点，1个客户端节点（kibana）的ES群集（5.4.1版）。

The data nodes are r4.2xlarge aws instance (61g memory, 8vCPU) with 30G memory allocated for the ES JAVA. 数据节点是r4.2xlarge aws实例（61g内存，8vCPU），为ES JAVA分配了30G内存。

We're writing around 200G of logs every day and keep it for the last 14 days. 我们每天要写大约200G的日志，并保留过去14天。

I'm looking for recommendations to our cluster to improve the cluster performance, especially the search performance (kibana). 我正在为我们的集群寻求建议，以提高集群性能，尤其是搜索性能（菊苣）。

More data nodes? 更多数据节点？ more client nodes? 更多的客户端节点？ bigger nodes? 更大的节点？ more replica's? 更多副本？ anything that can improve the performance is an option. 任何可以提高性能的选项都是可选的。

Is there anyone with something close to this design or loads? 有没有人接近这个设计或负载？ I'll be happy to hear about other designs and loads. 我很高兴听到其他设计和负载。

Thanks, Moshe 谢谢，Moshe

1 个解决方案

How many shards are you using? 您正在使用多少个碎片？ The default of 5? 默认为5？ That would even be a pretty good number. 这甚至是一个相当不错的数字。 Depending on who you ask a shard should be between 10G and 50G; 分片的大小取决于您的要求，分片应在10G到50G之间； with a logging use-case probably rather on the 50GB side. 与日志记录用例有关，而不是在50GB方面。
Which queries do you want to speed up? 您想加快哪些查询？ Do they mainly target recent data or long time-spans? 它们主要针对近期数据还是长时间跨度？ If you are mainly interested in recent data, you could use different node types in a hot-warm architecture. 如果您主要对最新数据感兴趣，则可以在热热架构中使用不同的节点类型。 More power to the nodes with recent data and less data; 使用最新数据和更少数据为节点提供更多功能； the bulk of older and less frequently accessed data on less powerful nodes. 功能较弱的节点上的大量较旧且访问频率较低的数据。
Generally you'll need to find your bottleneck. 通常，您需要找到瓶颈。 I'd get the free monitoring plugin and take a look at how both Kibana and Elasticsearch are doing. 我将获得免费的监视插件，并看看Kibana和Elasticsearch的表现如何。

Wild guess: You are limited on IO. 大胆的猜测：您在IO方面受到限制。 Prefer local disks over EBS, prefer SSDs over spinning disks, and if you can, get as many IOPS as you can afford for that use-case. 与EBS相比，本地磁盘更受欢迎，与旋转磁盘相比，SSD更受欢迎。如果可以的话，可以得到尽可能多的IOPS。