简体   繁体   English

GitHub WebHooks失败的通知?

[英]Notification on failed GitHub WebHooks?

My company uses GitHub Enterprise to automatically update production and test servers when certain protected branches are updated. 我的公司使用GitHub Enterprise在更新某些受保护的分支时自动更新生产和测试服务器。

When someone sends the push event, a payload is delivered to various servers, each running a small web server to receive such payloads. 当有人发送推送事件时,有效负载被传送到各个服务器,每个服务器运行一个小型Web服务器以接收这样的有效负载。 The web server then checks the "ref" element of the payload to see if the updated branch corresponds with the server. 然后,Web服务器检查有效负载的“ref”元素,以查看更新的分支是否与服务器对应。

For example, when someone sends the push event to the development branch, this is the start of the payload that the WebHook delivers to two servers, prod01 and dev01. 例如,当有人将推送事件发送到development分支时,这是WebHook提供给两个服务器prod01和dev01的有效负载的开始。

{
  "ref": "refs/heads/development",
  "before": "e9f64fa5a4bec5f68faf9533050097badf1c4c1f",
  "after": "e86956f39a26e85b850b81643332def33e7f15c6",
  "created": false,
  "deleted": false,
...
}

The prod01 server checks to see if the production branch was updated. prod01服务器检查production分支是否已更新。 It wasn't, so nothing happens on that server. 它不是,因此该服务器上没有任何反应。 The server dev01 checks the same payload to see if the development branch was updated. 服务器dev01检查相同的有效负载以查看development分支是否已更新。 It was ("ref": "refs/heads/development"), so dev01 runs the following commands. 它是(“ref”:“refs / heads / development”),因此dev01运行以下命令。

git -C /path/to/dev01/repo reset --hard
git -C /path/to/dev01/repo clean -f
git -C /path/to/dev01/repo pull origin development

When the payload is delivered correctly, GitHub Enterprise returns this. 当有效负载正确传递时,GitHub Enterprise会返回此信息。

工作负载

But sometimes the web server isn't running on prd01 or dev01, so we get this, instead. 但有时Web服务器没有在prd01或dev01上运行,所以我们得到了这个。

失败的有效负载:“我们无法提供此有效负载:服务超时”

When this happens, our workflow of updating the repository and expecting that the server will have the same changes doesn't work. 发生这种情况时,我们更新存储库并期望服务器具有相同更改的工作流程不起作用。

How can I be notified for failed payloads? 如何通知有效负载失败? I'd rather not set up something to poll the web servers or poll for bad statuses, if that's possible. 如果可能的话,我宁愿不设置某些内容来轮询Web服务器或轮询错误的状态。 Barring that, any solution that checks the status (RESTfully?) of the payload is better than checking to see if the web server is still running, since the payload may still fail for other reasons. 除此之外,任何检查有效负载状态(RESTful?)的解决方案都比检查Web服务器是否仍在运行更好,因为有效负载可能仍然因其他原因而失败。

Edit : I've checked internally and it looks like we could probably set up one of our current monitoring services to check for responses on the web server's port on each server. 编辑 :我在内部进行了检查,看起来我们可能会设置一个当前的监控服务来检查每台服务器上Web服务器端口的响应。 In the image above, it's 8090, but it frequently differs. 在上图中,它是8090,但它经常不同。

This isn't my ideal solution, since it only really covers the case when the web server is not responding. 这不是我理想的解决方案,因为它只涵盖Web服务器没有响应时的情况。 There are a variety of other reasons why the payload delivery might fail. 有效负载传递可能失败的原因还有很多。

How I would do it would be to stand up a small Jenkins instance, if I didn't have one already. 如果我还没有一个小Jenkins实例,我将如何做到这一点。 Then create a separate webhook firing on the same events that calls a Jenkins job that is basically counted to some arbitrary number (1000) and then check the target servers to see if the payload was sent to the servers. 然后在调用Jenkins作业的相同事件上创建一个单独的webhook,该作业基本上被计为某个任意数字(1000),然后检查目标服务器以查看有效负载是否已发送到服务器。 That way it wouldn't have to be constantly monitoring and would be fired at the same time as your webhook. 这样就不必持续监控,并且会在您的webhook同时被解雇。

Of course the Jenkins solution falls down if the Jenkins webhook also fails, so you would have to work to make that connection really bulletproof. 当然,如果Jenkins webhook也失败了,Jenkins解决方案就会失败,所以你必须努力使这种连接真正具有防弹性。 Which of course, might be counter-productive and time better spent elsewhere. 当然,这可能适得其反,而且时间更好。

It is too bad there doesn't seem to be any way in the GitHub API for enterprise to see the Response code to the requests. 在GitHub API中似乎没有任何方法可以让企业查看请求的响应代码。 The API can of course show the payloads of the requests, but that obviously won't help you. API当然可以显示请求的有效负载,但这显然不会对您有所帮助。

There are two options: 有两种选择:

Real-time Monitoring 实时监控

Configure log forwarding and monitor for failed events in hookshot_resque with error codes 422 or 504. hookshot_resque配置日志转发和监视失败事件,错误代码为422或504。

Cron-based Monitoring 基于Cron的监控

Some user that has administrative shell access to your instance can check for failed events using the command line utility ghe-webhook-logs . 某些对您的实例具有管理shell访问权限的用户可以使用命令行实用程序ghe-webhook-logs检查失败事件。 For example: 例如:

show all failed hook deliveries in the past day 显示过去一天所有失败的钩子交付

ghe-webhook-logs -f -a YYYYMMDD

The next step is to parse and automate the command. 下一步是解析和自动化命令。 While this introduces a delay in detecting a failed webhook, it is the most robust and reliable method available. 虽然这会导致检测到失败的webhook的延迟,但它是最可靠和最可靠的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM