简体繁体 English

如何设置 GCP 云监控 (Stackdriver) 警报策略周期大于 24 小时？

[英]How to set a GCP Cloud Monitoring (Stackdriver) alert policy period greater than 24 hours?

原文 2021-03-17 12:53:02 1 2 google-cloud-platform/ stackdriver/ google-cloud-stackdriver/ google-cloud-monitoring

Currently 24 hours is the limit of time a Cloud Monitoring (erstwhile Stackdriver) alert policy can be set.当前 24 小时是可以设置 Cloud Monitoring（以前的 Stackdriver）警报策略的时间限制。

However, if you have a daily activity, like a database backup, it might take a little more or less time each day (eg run in 1 hour 10min one day, 1 hour 12min the next day).但是，如果您有日常活动，例如数据库备份，则每天可能会花费更多或更少的时间（例如，一天运行 1 小时 10 分钟，第二天运行 1 小时 12 分钟）。 In this case, you might not see your completion indicator until 24 hours and 2 minutes since the prior indicator.在这种情况下，您可能要等到上一个指示器 24 小时 2 分钟后才能看到完成指示器。 This will cause Cloud Monitoring to issue an alert (because you are +2min over the alerting window limit).这将导致 Cloud Monitoring 发出警报（因为您超过警报 window 限制 +2 分钟）。

Is there a way to better handle the variance in these alerts, like a 25 hour look back period?有没有办法更好地处理这些警报中的差异，例如 25 小时回顾期？

2 个解决方案

Currently, there is no way to increase the period time over 24 hours.目前，没有办法增加超过 24 小时的周期时间。

However, there is a Feature Request already opened for that.但是，已经为此打开了一个功能请求。

You can follow it in this public link [1].您可以在此公共链接 [1] 中关注它。

Cheers,干杯，

[1] https://issuetracker.google.com/175703606 [1] https://issuetracker.google.com/175703606

I found a work around to this problem.我找到了解决这个问题的方法。

Create a metric for when your job starts (eg started_metric )为您的工作开始时创建一个指标（例如started_metric ）
Create a metric for when your job finishes (eg completed_metric )为您的工作完成时间创建一个指标（例如completed_metric ）

Now create a two part Alert Policy现在创建一个由两部分组成的警报策略

Require that started_metric occurs once per 24 hours要求started_metric每 24 小时发生一次
Require that completed_metric occurs once per 24 hours要求completed_metric每 24 小时发生一次
Trigger if (1) and (2) above are met (eg both > 24 hours)如果满足上述 (1) 和 (2) 则触发（例如均 > 24 小时）

This works around the 24 hour job jitter issue, as the job might take > 24 hours to complete, but it should always start (eg cron job) within 24 hours.这解决了 24 小时作业抖动问题，因为作业可能需要 > 24 小时才能完成，但它应该始终在 24 小时内启动（例如 cron 作业）。