Notification on failed GitHub WebHooks?

Question

My company uses GitHub Enterprise to automatically update production and test servers when certain protected branches are updated.

When someone sends the push event, a payload is delivered to various servers, each running a small web server to receive such payloads. The web server then checks the "ref" element of the payload to see if the updated branch corresponds with the server.

For example, when someone sends the push event to the development branch, this is the start of the payload that the WebHook delivers to two servers, prod01 and dev01.

{
  "ref": "refs/heads/development",
  "before": "e9f64fa5a4bec5f68faf9533050097badf1c4c1f",
  "after": "e86956f39a26e85b850b81643332def33e7f15c6",
  "created": false,
  "deleted": false,
...
}

The prod01 server checks to see if the production branch was updated. It wasn't, so nothing happens on that server. The server dev01 checks the same payload to see if the development branch was updated. It was ("ref": "refs/heads/development"), so dev01 runs the following commands.

git -C /path/to/dev01/repo reset --hard
git -C /path/to/dev01/repo clean -f
git -C /path/to/dev01/repo pull origin development

When the payload is delivered correctly, GitHub Enterprise returns this.

But sometimes the web server isn't running on prd01 or dev01, so we get this, instead.

When this happens, our workflow of updating the repository and expecting that the server will have the same changes doesn't work.

How can I be notified for failed payloads? I'd rather not set up something to poll the web servers or poll for bad statuses, if that's possible. Barring that, any solution that checks the status (RESTfully?) of the payload is better than checking to see if the web server is still running, since the payload may still fail for other reasons.

Edit : I've checked internally and it looks like we could probably set up one of our current monitoring services to check for responses on the web server's port on each server. In the image above, it's 8090, but it frequently differs.

This isn't my ideal solution, since it only really covers the case when the web server is not responding. There are a variety of other reasons why the payload delivery might fail.

Answer 1

How I would do it would be to stand up a small Jenkins instance, if I didn't have one already. Then create a separate webhook firing on the same events that calls a Jenkins job that is basically counted to some arbitrary number (1000) and then check the target servers to see if the payload was sent to the servers. That way it wouldn't have to be constantly monitoring and would be fired at the same time as your webhook.

Of course the Jenkins solution falls down if the Jenkins webhook also fails, so you would have to work to make that connection really bulletproof. Which of course, might be counter-productive and time better spent elsewhere.

It is too bad there doesn't seem to be any way in the GitHub API for enterprise to see the Response code to the requests. The API can of course show the payloads of the requests, but that obviously won't help you.

Answer 2

There are two options:

Real-time Monitoring

Configure log forwarding and monitor for failed events in hookshot_resque with error codes 422 or 504.

Cron-based Monitoring

Some user that has administrative shell access to your instance can check for failed events using the command line utility ghe-webhook-logs . For example:

show all failed hook deliveries in the past day

ghe-webhook-logs -f -a YYYYMMDD

The next step is to parse and automate the command. While this introduces a delay in detecting a failed webhook, it is the most robust and reliable method available.

Notification on failed GitHub WebHooks?

Question

2 answers

solution1
1 2016-02-02 21:42:36

solution2
0 ACCPTED 2017-03-11 23:32:42

Real-time Monitoring

Cron-based Monitoring

Notification on failed GitHub WebHooks?

Question

2 answers

solution1 1 2016-02-02 21:42:36

solution2 0 ACCPTED 2017-03-11 23:32:42

Real-time Monitoring

Cron-based Monitoring

solution1
1 2016-02-02 21:42:36

solution2
0 ACCPTED 2017-03-11 23:32:42