CI/CD、Terraform 和 AWS ECS：使用 Lambda 应用数据库迁移？

Question

I have an app consisting of multiple services, each with its own postgres database.我有一个包含多个服务的应用程序，每个服务都有自己的 postgres 数据库。 I want to deploy it to AWS.我想将它部署到 AWS。 Kube is too complicated for me, so I decided to use AWS ECS for services + AWS RDS for DBs. Kube 对我来说太复杂了，所以我决定将 AWS ECS 用于服务 + AWS RDS 用于数据库。 And deploy everything using Terraform.并使用 Terraform 部署所有内容。

I have a CI/CD pipeline set up, which upon a merge to the staging branch, builds, tests, and deploys the app to the corresponding environment.我设置了一个 CI/CD 管道，在合并到暂存分支后，构建、测试应用程序并将其部署到相应的环境。 Deploying basically consists of building and pushing docker images to AWS ECR and then calling terraform plan/apply .部署基本上包括构建和推送 docker 图像到 AWS ECR，然后调用terraform plan/apply 。

Terraform creates/updates VPC, su.nets, ECS services with tasks, RDS instances, etc. Terraform 创建/更新VPC、su.nets、带有任务的ECS服务、RDS实例等。

This works.这行得通。

But I'm not sure how to apply db migrations.但我不确定如何应用数据库迁移。

I have a separate console app whose only purpose is to apply migrations and then quit.我有一个单独的控制台应用程序，其唯一目的是应用迁移然后退出。 So I can just run it in the CI/CD pipeline before or after applying terraform. However, before doesn't work because if it's the very first deployment then the databases wouldn't exist yet, and after doesn't work because I want to first apply migrations and then start services, not the other way around.所以我可以在应用 terraform 之前或之后在 CI/CD 管道中运行它。但是，之前不起作用，因为如果它是第一次部署，那么数据库还不存在，之后不起作用，因为我想要首先应用迁移然后启动服务，而不是相反。

So I need some way to run this migrator console app in the middle of terraform deployment – after rds but before ecs.所以我需要一些方法来在 terraform 部署的中间运行这个迁移器控制台应用程序——在 rds 之后但在 ecs 之前。

I read an article by Andrew Lock where he solves this exact problem by using jobs and init containers in Kube.netes.我读了一篇 Andrew Lock 的文章，他通过在 Kube.netes 中使用作业和初始化容器解决了这个确切的问题。 But I'm not using Kube, so that's not an option for me.但我没有使用 Kube，所以这不是我的选择。

I see in AWS ECS docs that you can run standalone tasks (one-off tasks), which is basically what I need, and you can run them with AWS CLI, but whilst I can use the cli from the pipeline, I can't use it in the middle of terraform doing its thing.我在 AWS ECS 文档中看到您可以运行独立任务（一次性任务），这基本上是我所需要的，并且您可以使用 AWS CLI 运行它们，但是虽然我可以使用管道中的 cli，但我不能在 terraform 做它的事情的中间使用它。 I can't just say to terraform "run some random command after creating this resource, but before that one".我不能只对 terraform 说“在创建此资源之后但在此资源之前运行一些随机命令”。

Then I thought about using AWS Lambda. There is a data source type in Terraform called aws_lambda_invocation , which does exactly what it says in the name.然后我想到了使用 AWS Lambda。在 Terraform 中有一个名为aws_lambda_invocation的数据源类型，它的作用与名称中所说的完全相同。 So now I'm thinking about building a docker image of migrator in the build stage of the pipeline, pushing it to AWS ECR, then in terraform creating an aws_lambda_function resource from the image and aws_lambda_invocation data source invoking the function. Make ECS depend on the invocation, and it should work, right?所以现在我正在考虑在管道的构建阶段构建迁移器的 docker 图像，将其推送到 AWS ECR，然后在 terraform 从图像创建aws_lambda_function资源，并调用 function 的aws_lambda_invocation数据源。使 ECS 依赖于调用，它应该工作，对吧？

There is one problem with this: data sources are queried both when planning and applying, but I only want the migrator lambda to run when applying.这样做有一个问题：计划和申请时都查询数据源，但我只希望迁移器 lambda 在申请时运行。 I think it could be solved by using count attribute and some custom variable in the invocation data source.我认为这可以通过在调用数据源中使用count属性和一些自定义变量来解决。

I think this approach might work, but surely there must be a better, less convoluted way of doing it?我认为这种方法可能有效，但肯定有更好、更简单的方法吗？ Any recommendations?有什么建议吗？

Note: I can't apply migrations from the services themselves, because I have more than one instance of each, so there is a possibility of two services trying to apply migrations to the same db at the same time, which would end badly.注意：我不能从服务本身应用迁移，因为每个服务都有多个实例，所以有可能有两个服务试图同时将迁移应用到同一个数据库，结果会很糟糕。

If you are wondering, I use .NET 5 and GitLab, but I think it's not relevant for the question.如果您想知道，我使用 .NET 5 和 GitLab，但我认为这与问题无关。

Answer 1

Well, in case you are wondering, the lambda solution that I described in the question post is valid.好吧，如果您想知道，我在问题帖中描述的 lambda 解决方案是有效的。 It's not super convenient, but it works.这不是超级方便，但它确实有效。 In terraform you first need to create a function connected to a vpc in which your database lives, add all the necessary entries to the db sg for ingress and lambda sg for egress, and then call it smth like this (here I pass connection string as an argument):在 terraform 中，您首先需要创建一个连接到数据库所在的 vpc 的 function，将所有必要的条目添加到入口的 db sg 和出口的 lambda sg，然后像这样调用它（这里我将连接字符串传递为一个参数）：

data "aws_lambda_invocation" "migrator" {
  count         = var.apply_migrations == "yes" ? 1 : 0
  function_name = aws_lambda_function.migrator.function_name
  input         = <<JSON
"Host=${aws_db_instance.service_a.address};Port=${aws_db_instance.service_a.port};Database=${aws_db_instance.service_a.db_name};Username=${aws_db_instance.service_a.username};Password=${aws_db_instance.service_a.password};"
JSON
}

Make apply_migration = "no" by default.默认情况下使apply_migration = "no"。 Then you would only need to specify it when applying – terraform apply -var apply_migrations=yes .那么你只需要在申请时指定它 – terraform apply -var apply_migrations=yes 。

Then just make aws_ecs_service (or whatever you use to deploy your application) to depend on the invocation.然后只需使 aws_ecs_service（或您用来部署应用程序的任何东西）依赖于调用。

The biggest problem with this solution is that running terraform destroy takes a very long time.此解决方案的最大问题是运行terraform destroy需要很长时间。 This is because to connect the lambda to the vpc, AWS creates a.network interface for it automatically (so it is not managed by terraform).这是因为要将 lambda 连接到 vpc，AWS 会自动为其创建一个 .network 接口（因此它不受 terraform 管理）。 When destroy destroys the lambda, the interface stays in the "In Use" state for some time after destruction (it varies – takes 10 min or more – and you can't even delete it manually).当destroy销毁 lambda 时，界面会在销毁后的一段时间内停留在“In Use” state （它会有所不同 - 需要 10 分钟或更长时间 - 你甚至无法手动删除它）。 That leads to terraform being unable to delete the su.net used by the interface, which leads to terraform hanging for a long time.那导致terraform无法删除接口使用的su.net，导致terraform挂了很久。

But it doesn't really matter, because I found a much better solution, which takes more setup, but works flawlessly.但这并不重要，因为我找到了一个更好的解决方案，它需要更多的设置，但可以完美运行。

It turns out that terraform can run arbitrary commands.事实证明，terraform可以运行任意命令。 There is a docker provider available for it, and you can basically spin up any container you want to do whatever you want.有一个 docker 提供程序可供使用，您基本上可以启动任何您想要做的任何容器。

terraform {
  # ...

  required_providers {
    # ...

    docker = {
      source  = "kreuzwerker/docker"
      version = "2.16.0"
    }
  }
}

# this setup works for gitlab ci/cd with docker-in-docker
provider "docker" {
  host = "tcp://docker:2376"

  ca_material   = file("/certs/client/ca.pem")
  cert_material = file("/certs/client/cert.pem")
  key_material  = file("/certs/client/key.pem")

  registry_auth {
    address  = var.image_registry_uri
    # username and password are passed via DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASS env vars
  }
}

data "docker_registry_image" "migrator" {
  name = var.migrator_image_uri
}

resource "docker_image" "migrator" {
  name          = data.docker_registry_image.migrator.name
  pull_triggers = [data.docker_registry_image.migrator.sha256_digest]
}

resource "docker_container" "migrator" {
  name     = "migrator"
  image    = docker_image.migrator.repo_digest
  attach   = true # terraform will wait for container to finish before proceeding
  must_run = false # it's a one-time job container, not a daemon
  env = [
    "BASTION_PRIVATE_KEY=${var.bastion_private_key}",
    "BASTION_HOST=${aws_instance.bastion.public_ip}",
    "BASTION_USER=ec2-user",
    "DATABASE_HOST=${aws_db_instance.service_a.address}",
    "DATABASE_PORT=${aws_db_instance.service_a.port}",
    "DATABASE_NAME=${aws_db_instance.service_a.db_name}",
    "DATABASE_USER=${aws_db_instance.service_a.username}",
    "DATABASE_PASSWORD=${aws_db_instance.service_a.password}"
  ]
}

As you can see, you need a bastion instance setup, but you would probably need it anyway.如您所见，您需要一个堡垒实例设置，但无论如何您可能都需要它。 Then in the migrator program you need to use an ssh tunnel to connect to the db.然后在迁移程序中，您需要使用 ssh 隧道连接到数据库。 Shouldn't be a problem, ssh packages are available for every language.应该不是问题，ssh 包适用于每种语言。 Here's .NET Core example:这是 .NET 核心示例：

using var stream = new MemoryStream();
using var writer = new StreamWriter(stream);
writer.Write(Environment.GetEnvironmentVariable("BASTION_PRIVATE_KEY"));
writer.Flush();
stream.Position = 0;

using var keyFile = new PrivateKeyFile(stream);

using var client = new SshClient(
    Environment.GetEnvironmentVariable("BASTION_HOST"),
    Environment.GetEnvironmentVariable("BASTION_USER"),
    keyFile
);

client.Connect();

var localhost = "127.0.0.1";
uint localPort = 5432;

var dbHost = Environment.GetEnvironmentVariable("DATABASE_HOST");
var dbPort = uint.Parse(Environment.GetEnvironmentVariable("DATABASE_PORT"));
var dbName = Environment.GetEnvironmentVariable("DATABASE_NAME");
var dbUser = Environment.GetEnvironmentVariable("DATABASE_USER");
var dbPassword = Environment.GetEnvironmentVariable("DATABASE_PASSWORD");

using var tunnel = new ForwardedPortLocal(localhost, localPort, dbHost, dbPort);
client.AddForwardedPort(tunnel);

tunnel.Start();

var dbConnectionString = $"Host={localhost};Port={localPort};Database={dbName};Username={dbUser};Password={dbPassword};";

var host = ServiceA.Api.Program
    .CreateHostBuilder(args: new[] { "ConnectionStrings:ServiceA=" + dbConnectionString })
    .Build();

using (var scope = host.Services.CreateScope()) {
    var dbContext = scope
        .ServiceProvider
        .GetRequiredService<ServiceADbContext>();

    dbContext.Database.Migrate();
}

tunnel.Stop();
client.Disconnect();

In gitlab ci/cd, terraform jobs use:在 gitlab ci/cd, terraform 作业中使用：

image:
  name: hashicorp/terraform:1.1.6
  entrypoint:
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

services:
  - docker:19.03.12-dind

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_REGISTRY_USER: "AWS"
  # set DOCKER_REGISTRY_PASS after authenticating to the registry

CI/CD、Terraform 和 AWS ECS：使用 Lambda 应用数据库迁移？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-03-01 17:20:05

CI/CD、Terraform 和 AWS ECS：使用 Lambda 应用数据库迁移？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-03-01 17:20:05

解决方案1
0 已采纳 2022-03-01 17:20:05