简体   繁体   English

Amazon Elasticsearch对FreeStorageSpace指标的解释

[英]Amazon elasticsearch interpretation of FreeStorageSpace metrics

I have 6 instances of type m3.large.elasticsearch and storage type instance. 我有6个m3.large.elasticsearch类型的实例和存储类型实例。

在此处输入图片说明

I don't really get what does Average, Minimum, Maximum ..mean here? 我真的不明白这里的平均,最小,最大..意味着什么?

I am not getting any logs into my cluster right now although it shows FreeStorageSpace as 14.95GB here: 尽管这里显示FreeStorageSpace为14.95GB,但我现在没有任何日志进入群集:

在此处输入图片说明

But my FreeStorageSpace graph for "Minimum" has reached zero! 但是我的“最小”的FreeStorageSpace图已达到零!

在此处输入图片说明

What is happening here? 这是怎么回事

I was also confused by this. 我对此也感到困惑。 Minimum means size on single data node - one which has least free space. 最小均值表示单个数据节点上的大小-空闲空间最少的节点。 And Sum means size of entire cluster (summation of free space on all data nodes). 总和表示整个群集的大小(所有数据节点上的可用空间的总和)。 Got this info from following link 通过以下链接获得了此信息

http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html

We ran into the same confusion. 我们遇到了同样的困惑。 Avg, Min, Max spreads the calculation across all nodes and Sum combines the Free/Used space for the whole cluster. 平均,最小值,最大值将计算结果分布在所有节点上,总和将整个集群的可用/已用空间合并在一起。

We had assumed that Average FreeStorageSpace means average free storage space of the whole cluster and set an alarm keeping the following calculation in mind: 我们假设平均FreeStorageSpace意味着整个集群的平均可用存储空间,并设置了一个警报,牢记以下计算:

  1. Per day index = 1 TB 每天索引= 1 TB
  2. Max days to keep indices = 10 保留索引的最大天数= 10

Hence we had an average utilization of 10 TB at any point of time. 因此,我们在任何时间点的平均利用率均为10 TB。 Assuming, we will go 2x - ie 20 TB our actual storage need as per https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html#aes-bp-storage was with replication factor of 2 is: 假设,按照https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html#aes-bp-storage进行复制,我们将实际存储量提高2倍,即20 TB 2的系数是:

(20 * 2 * 1.1 / 0.95 / 0.8) = 57.89 =~ 60 TB (20 * 2 * 1.1 / 0.95 / 0.8)= 57.89 =〜60 TB

So we provisioned 18 X 3.8 TB instances =~ 68 TB to accomodated 2x = 60 TB 因此,我们将18 X 3.8 TB实例=〜68 TB调配为2x = 60 TB

So we had set an alarm that if we go below 8 TB free storage - it means we have hit our 2x limit and should scale up. 因此,我们设置了一个警报,即如果我们将可用存储空间降至8 TB以下-这​​意味着我们已达到2倍的限制并应扩大规模。 Hence we set the alarm 因此,我们设置了警报

FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Average + Duration=1minute FreeStorageSpace <= 8388608.00在5分钟内获得4个数据点+统计=平均+持续时间= 1分钟

FreeStorageSpace is in MB hence - 8 TB = 8388608 MB. FreeStorageSpace的大小为MB,因此-8 TB = 8388608 MB。

But we immediately got alerted because our average utilization per node was below 8 TB. 但是我们立即收到警报,因为我们每个节点的平均利用率低于8 TB。

After realizing that to get accurate storage you need to do FreeStorageSpace sum for 1 min - we set the alarm as 意识到要获得准确的存储后,您需要做FreeStorageSpace sum 1分钟-我们将警报设置为

FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Sum + Duration=1minute FreeStorageSpace <= 8388608.00在5分钟内获得4个数据点+统计=总和+持续时间= 1分钟

The above calculation checked out and we were able to set the right alarms. 以上计算已签出,我们可以设置正确的警报。

The same applies for ClusterUsedSpace calculation. ClusterUsedSpace计算也是如此。

You should also track the actual free space percent using Cloudwatch Math: 您还应该使用Cloudwatch Math跟踪实际可用空间百分比:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM