简体   繁体   中英

docker-compose healthcheck retry frequency != interval

I recently set up healthcheck s in my docker-compose config.

It is doing great and I like it. Here's a typical example:

services:
  app:
    healthcheck:
      test: curl -sS http://127.0.0.1:4000 || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 30s

My container is quite slow to boot, hence I set up a 30 seconds start_period .

But it doesn't really fit my expectation: I don't need check every 5 seconds, but I need to know when the container is ready for the first time as soon as possible for my orchestration, and since my start_period is approximative, if it is not ready yet at first check, I have to wait for interval before retry.

What I'd like to have is:

  • While container is not healthy, retry every 5 seconds
  • Once it is healthy, check every 1 minute

Ain't there a way to achieve this out-of-the-box with docker-compose ?

I could write a custom script to achieve this, but I'd rather have a native solution if it is possible.

Unfortunately, this is not possible out of the box.
All the duration set are final. They can't be changed depending on the container state.

However, according to the documentation , the probe does not seem to wait for the start_period to finish before checking your test. The only thing it does is that any failure hapenning during start_period will not be considered as an error.

Below is the sentence that make me think that :

start_period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

I encourage you to test if this is really the case as I've never really paid any attention if the healthcheck is tested during the start period or not.
And if it is the case, you can probably increase your start_period if you're unsure about the duration and also increase the interval in order to find a good compromise.

I wrote a script that does this, though I'd rather find a native solution:

#!/bin/sh

HEALTHCHECK_FILE="/root/.healthchecked"

COMMAND=${*?"Usage: healthcheck_retry <COMMAND>"}

if [ -r "$HEALTHCHECK_FILE" ]; then
  LAST_HEALTHCHECK=$(date -r "$HEALTHCHECK_FILE" +%s)
  # FIVE_MINUTES_AGO=$(date -d 'now - 5 minutes' +%s)
  FIVE_MINUTES_AGO=$(echo "$(( $(date +%s)-5*60 ))")
  echo "Healthcheck file present";
  # if (( $LAST_HEALTHCHECK > $FIVE_MINUTES_AGO )); then
  if [ $LAST_HEALTHCHECK -gt $FIVE_MINUTES_AGO ]; then
    echo "Healthcheck too recent";
    exit 0;
  fi
fi

if $COMMAND ; then
  echo "\"$COMMAND\" succeed: updating file";
  touch $HEALTHCHECK_FILE;
  exit 0;
else
  echo "\"$COMMAND\" failed: exiting";
  exit 1;
fi

Which I use: test: /healthcheck_retry.sh curl -fsS localhost:4000/healthcheck

The pain is that I need to make sure the script is available in every container, so I have to create an extra volume for this:

    image: postgres:11.6-alpine
    volumes:
      - ./scripts/utils/healthcheck_retry.sh:/healthcheck_retry.sh

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM