简体   繁体   中英

CI/CD, Terraform and AWS ECS: Applying database migrations using Lambda?

I have an app consisting of multiple services, each with its own postgres database. I want to deploy it to AWS. Kube is too complicated for me, so I decided to use AWS ECS for services + AWS RDS for DBs. And deploy everything using Terraform.

I have a CI/CD pipeline set up, which upon a merge to the staging branch, builds, tests, and deploys the app to the corresponding environment. Deploying basically consists of building and pushing docker images to AWS ECR and then calling terraform plan/apply .

Terraform creates/updates VPC, su.nets, ECS services with tasks, RDS instances, etc.

This works.

But I'm not sure how to apply db migrations.

I have a separate console app whose only purpose is to apply migrations and then quit. So I can just run it in the CI/CD pipeline before or after applying terraform. However, before doesn't work because if it's the very first deployment then the databases wouldn't exist yet, and after doesn't work because I want to first apply migrations and then start services, not the other way around.

So I need some way to run this migrator console app in the middle of terraform deployment – after rds but before ecs.

I read an article by Andrew Lock where he solves this exact problem by using jobs and init containers in Kube.netes. But I'm not using Kube, so that's not an option for me.

I see in AWS ECS docs that you can run standalone tasks (one-off tasks), which is basically what I need, and you can run them with AWS CLI, but whilst I can use the cli from the pipeline, I can't use it in the middle of terraform doing its thing. I can't just say to terraform "run some random command after creating this resource, but before that one".

Then I thought about using AWS Lambda. There is a data source type in Terraform called aws_lambda_invocation , which does exactly what it says in the name. So now I'm thinking about building a docker image of migrator in the build stage of the pipeline, pushing it to AWS ECR, then in terraform creating an aws_lambda_function resource from the image and aws_lambda_invocation data source invoking the function. Make ECS depend on the invocation, and it should work, right?

There is one problem with this: data sources are queried both when planning and applying, but I only want the migrator lambda to run when applying. I think it could be solved by using count attribute and some custom variable in the invocation data source.

I think this approach might work, but surely there must be a better, less convoluted way of doing it? Any recommendations?

Note: I can't apply migrations from the services themselves, because I have more than one instance of each, so there is a possibility of two services trying to apply migrations to the same db at the same time, which would end badly.

If you are wondering, I use .NET 5 and GitLab, but I think it's not relevant for the question.

Well, in case you are wondering, the lambda solution that I described in the question post is valid. It's not super convenient, but it works. In terraform you first need to create a function connected to a vpc in which your database lives, add all the necessary entries to the db sg for ingress and lambda sg for egress, and then call it smth like this (here I pass connection string as an argument):

data "aws_lambda_invocation" "migrator" {
  count         = var.apply_migrations == "yes" ? 1 : 0
  function_name = aws_lambda_function.migrator.function_name
  input         = <<JSON
"Host=${aws_db_instance.service_a.address};Port=${aws_db_instance.service_a.port};Database=${aws_db_instance.service_a.db_name};Username=${aws_db_instance.service_a.username};Password=${aws_db_instance.service_a.password};"
JSON
}

Make apply_migration = "no" by default. Then you would only need to specify it when applying – terraform apply -var apply_migrations=yes .

Then just make aws_ecs_service (or whatever you use to deploy your application) to depend on the invocation.

The biggest problem with this solution is that running terraform destroy takes a very long time. This is because to connect the lambda to the vpc, AWS creates a.network interface for it automatically (so it is not managed by terraform). When destroy destroys the lambda, the interface stays in the "In Use" state for some time after destruction (it varies – takes 10 min or more – and you can't even delete it manually). That leads to terraform being unable to delete the su.net used by the interface, which leads to terraform hanging for a long time.

But it doesn't really matter, because I found a much better solution, which takes more setup, but works flawlessly.

It turns out that terraform can run arbitrary commands. There is a docker provider available for it, and you can basically spin up any container you want to do whatever you want.

terraform {
  # ...

  required_providers {
    # ...

    docker = {
      source  = "kreuzwerker/docker"
      version = "2.16.0"
    }
  }
}

# this setup works for gitlab ci/cd with docker-in-docker
provider "docker" {
  host = "tcp://docker:2376"

  ca_material   = file("/certs/client/ca.pem")
  cert_material = file("/certs/client/cert.pem")
  key_material  = file("/certs/client/key.pem")

  registry_auth {
    address  = var.image_registry_uri
    # username and password are passed via DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASS env vars
  }
}

data "docker_registry_image" "migrator" {
  name = var.migrator_image_uri
}

resource "docker_image" "migrator" {
  name          = data.docker_registry_image.migrator.name
  pull_triggers = [data.docker_registry_image.migrator.sha256_digest]
}

resource "docker_container" "migrator" {
  name     = "migrator"
  image    = docker_image.migrator.repo_digest
  attach   = true # terraform will wait for container to finish before proceeding
  must_run = false # it's a one-time job container, not a daemon
  env = [
    "BASTION_PRIVATE_KEY=${var.bastion_private_key}",
    "BASTION_HOST=${aws_instance.bastion.public_ip}",
    "BASTION_USER=ec2-user",
    "DATABASE_HOST=${aws_db_instance.service_a.address}",
    "DATABASE_PORT=${aws_db_instance.service_a.port}",
    "DATABASE_NAME=${aws_db_instance.service_a.db_name}",
    "DATABASE_USER=${aws_db_instance.service_a.username}",
    "DATABASE_PASSWORD=${aws_db_instance.service_a.password}"
  ]
}

As you can see, you need a bastion instance setup, but you would probably need it anyway. Then in the migrator program you need to use an ssh tunnel to connect to the db. Shouldn't be a problem, ssh packages are available for every language. Here's .NET Core example:

using var stream = new MemoryStream();
using var writer = new StreamWriter(stream);
writer.Write(Environment.GetEnvironmentVariable("BASTION_PRIVATE_KEY"));
writer.Flush();
stream.Position = 0;

using var keyFile = new PrivateKeyFile(stream);

using var client = new SshClient(
    Environment.GetEnvironmentVariable("BASTION_HOST"),
    Environment.GetEnvironmentVariable("BASTION_USER"),
    keyFile
);

client.Connect();

var localhost = "127.0.0.1";
uint localPort = 5432;

var dbHost = Environment.GetEnvironmentVariable("DATABASE_HOST");
var dbPort = uint.Parse(Environment.GetEnvironmentVariable("DATABASE_PORT"));
var dbName = Environment.GetEnvironmentVariable("DATABASE_NAME");
var dbUser = Environment.GetEnvironmentVariable("DATABASE_USER");
var dbPassword = Environment.GetEnvironmentVariable("DATABASE_PASSWORD");

using var tunnel = new ForwardedPortLocal(localhost, localPort, dbHost, dbPort);
client.AddForwardedPort(tunnel);

tunnel.Start();

var dbConnectionString = $"Host={localhost};Port={localPort};Database={dbName};Username={dbUser};Password={dbPassword};";

var host = ServiceA.Api.Program
    .CreateHostBuilder(args: new[] { "ConnectionStrings:ServiceA=" + dbConnectionString })
    .Build();

using (var scope = host.Services.CreateScope()) {
    var dbContext = scope
        .ServiceProvider
        .GetRequiredService<ServiceADbContext>();

    dbContext.Database.Migrate();
}

tunnel.Stop();
client.Disconnect();

In gitlab ci/cd, terraform jobs use:

image:
  name: hashicorp/terraform:1.1.6
  entrypoint:
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

services:
  - docker:19.03.12-dind

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_REGISTRY_USER: "AWS"
  # set DOCKER_REGISTRY_PASS after authenticating to the registry

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM