[英]Why won't my AWS ECS service start my task?
我在使用 Terraform 在 AWS 中創建的新 AWS 負載均衡器和 AWS-ECS 存儲庫、集群和任務時遇到問題。 一切都在創建,沒有錯誤。 在單獨的文件中有一些 IAM 角色和證書。 這些是這里的相關定義。 發生的情況是 ECS 服務正在創建一個任務,但該任務在啟動后立即關閉。 我根本沒有在 Cloudwatch 日志組中看到任何日志。 事實上,它甚至從未被創造出來。
對我來說,當我第一次運行基礎架構時,整個事情將無法運行,這是有道理的,因為 ECS 存儲庫是全新的,沒有任何 Docker 映像推送到它。 但是我已經推送了圖像,並且服務再也沒有啟動過。 我想它會在失敗后無限循環嘗試啟動任務,但事實並非如此。
我通過銷毀服務然后重新創建它來強制它重新啟動。 鑒於現在有一個圖像要運行,我希望它能夠工作。 它具有與初始啟動相同的行為,即服務創建一個無法啟動的任務,並且沒有記錄原因,然后再也不會運行任務。
有誰知道這有什么問題,或者我可能會在哪里看到錯誤?
locals {
container_name = "tdweb-web-server-container"
}
resource "aws_lb" "web_server" {
name = "tdweb-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.lb_sg.id]
subnets = [
aws_subnet.subnet_a.id,
aws_subnet.subnet_b.id,
aws_subnet.subnet_c.id
]
}
resource "aws_security_group" "lb_sg" {
name = "ALB Security Group"
description = "Allows TLS inbound traffic"
vpc_id = aws_vpc.main.id
ingress {
description = "TLS from VPC"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "web_server_service" {
name = "Web Sever Service Security Group"
description = "Allows HTTP inbound traffic"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTP from VPC"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_alb_listener" "https" {
load_balancer_arn = aws_lb.web_server.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2016-08"
certificate_arn = aws_acm_certificate.main.arn
default_action {
target_group_arn = aws_lb_target_group.web_server.arn
type = "forward"
}
}
resource "random_string" "target_group_suffix" {
length = 4
upper = false
special = false
}
resource "aws_lb_target_group" "web_server" {
name = "web-server-target-group-${random_string.target_group_suffix.result}"
port = 80
protocol = "HTTP"
target_type = "ip"
vpc_id = aws_vpc.main.id
lifecycle {
create_before_destroy = true
}
}
resource "aws_iam_role" "web_server_task" {
name = "tdweb-web-server-task-role"
assume_role_policy = data.aws_iam_policy_document.web_server_task.json
}
data "aws_iam_policy_document" "web_server_task" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "web_server_task" {
for_each = toset([
"arn:aws:iam::aws:policy/AmazonSQSFullAccess",
"arn:aws:iam::aws:policy/AmazonS3FullAccess",
"arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess",
"arn:aws:iam::aws:policy/AWSLambdaInvocation-DynamoDB"
])
role = aws_iam_role.web_server_task.name
policy_arn = each.value
}
resource "aws_ecr_repository" "web_server" {
name = "tdweb-web-server-repository"
}
resource "aws_ecs_cluster" "web_server" {
name = "tdweb-web-server-cluster"
}
resource "aws_ecs_task_definition" "web_server" {
family = "task_definition_name"
task_role_arn = aws_iam_role.web_server_task.arn
execution_role_arn = aws_iam_role.ecs_task_execution.arn
network_mode = "awsvpc"
cpu = "1024"
memory = "2048"
requires_compatibilities = ["FARGATE"]
container_definitions = <<DEFINITION
[
{
"name": "${local.container_name}",
"image": "${aws_ecr_repository.web_server.repository_url}:latest",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/tdweb-task",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"cpu": 0,
"essential": true
}
]
DEFINITION
}
resource "aws_ecs_service" "web_server" {
name = "tdweb-web-server-service"
cluster = aws_ecs_cluster.web_server.id
launch_type = "FARGATE"
task_definition = aws_ecs_task_definition.web_server.arn
desired_count = 1
load_balancer {
target_group_arn = aws_lb_target_group.web_server.arn
container_name = local.container_name
container_port = 80
}
network_configuration {
subnets = [
aws_subnet.subnet_a.id,
aws_subnet.subnet_b.id,
aws_subnet.subnet_c.id
]
assign_public_ip = true
security_groups = [aws_security_group.web_server_service.id]
}
}
編輯:要回答評論,這里是 VPC 和子網
resource "aws_vpc" "main" {
cidr_block = "172.31.0.0/16"
}
resource "aws_subnet" "subnet_a" {
vpc_id = aws_vpc.main.id
availability_zone = "us-east-1a"
cidr_block = "172.31.0.0/20"
}
resource "aws_subnet" "subnet_b" {
vpc_id = aws_vpc.main.id
availability_zone = "us-east-1b"
cidr_block = "172.31.16.0/20"
}
resource "aws_subnet" "subnet_c" {
vpc_id = aws_vpc.main.id
availability_zone = "us-east-1c"
cidr_block = "172.31.32.0/20"
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
編輯:這是一個有點啟發性的更新。 我發現此錯誤不是在任務日志中,而是在任務中的容器日志中。 我從來不知道在那里。
狀態原因CannotPullContainerError:來自守護進程的錯誤響應:獲取https://563407091361.dkr.ecr.us-east-1.amazonaws.com/v2/ :net/http:請求在等待連接時取消(等待連接時已超過Client.Timeout標題)
似乎該服務無法從 ECR 存儲庫中提取容器。 閱讀后我不知道如何解決這個問題。 我還在四處張望。
根據評論,一個可能的問題是子集中缺乏互聯網訪問。 這可以糾正如下:
# Route table to connect to Internet Gateway
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
resource "aws_route_table_association" "subnet_public_a" {
subnet_id = aws_subnet.subnet_a.id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "subnet_public_b" {
subnet_id = aws_subnet.subnet_b.id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "subnet_public_c" {
subnet_id = aws_subnet.subnet_c.id
route_table_id = aws_route_table.public.id
}
您還可以將depends_on
添加到您的aws_ecs_service
中,以便它等待這些附件完成。
關聯的較短替代方案:
locals {
subnets = [aws_subnet.subnet_a.id,
aws_subnet.subnet_b.id,
aws_subnet.subnet_c.id]
}
resource "aws_route_table_association" "subnet_public_b" {
count = length(local.subnets)
subnet_id = local.subnets[count.index]
route_table_id = aws_route_table.public.id
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.