[英]Error deploying EKS node-group with terraform
我在使用 Terraform 部署 EKS 集群中的节点组时遇到问题。 该错误看起来像是一个插件有问题,但我不知道如何解决。
如果我在 AWS 控制台 (web) 中看到 EC2,我可以看到集群实例,但集群中出现此错误。
错误显示在我的管道中:
错误:等待 EKS 节点组 (UNIR-API-REST-CLUSTER-DEV:node_sping_boot) 创建:NodeCreationFailure:实例无法加入 kubernetes 集群。 资源 ID:[i-05ed58f8101240dc8]
在 EKS.tf 第 17 行,在资源“aws_eks_node_group”“节点”中:
17:资源“aws_eks_node_group”“节点”
2020-06-01T00:03:50.576Z [DEBUG] 插件:插件进程退出:path=/home/ubuntu/.jenkins/workspace/shop_infraestucture_generator_pipline/shop-proyect-dev/.terraform/plugins/linux_amd64/terraform-provider- aws_v2.64.0_x4 pid=13475
2020-06-01T00:03:50.576Z [DEBUG] 插件:插件退出
并且错误打印在AWS 控制台中:
这是我用来创建项目的 Terraform 中的代码:
EKS.tf用于创建集群和去节点
resource "aws_eks_cluster" "CLUSTER" {
name = "UNIR-API-REST-CLUSTER-${var.SUFFIX}"
role_arn = "${aws_iam_role.eks_cluster_role.arn}"
vpc_config {
subnet_ids = [
"${aws_subnet.unir_subnet_cluster_1.id}","${aws_subnet.unir_subnet_cluster_2.id}"
]
}
depends_on = [
"aws_iam_role_policy_attachment.AmazonEKSWorkerNodePolicy",
"aws_iam_role_policy_attachment.AmazonEKS_CNI_Policy",
"aws_iam_role_policy_attachment.AmazonEC2ContainerRegistryReadOnly",
]
}
resource "aws_eks_node_group" "nodes" {
cluster_name = "${aws_eks_cluster.CLUSTER.name}"
node_group_name = "node_sping_boot"
node_role_arn = "${aws_iam_role.eks_nodes_role.arn}"
subnet_ids = [
"${aws_subnet.unir_subnet_cluster_1.id}","${aws_subnet.unir_subnet_cluster_2.id}"
]
scaling_config {
desired_size = 1
max_size = 5
min_size = 1
}
# instance_types is mediumt3 by default
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
"aws_iam_role_policy_attachment.AmazonEKSWorkerNodePolicy",
"aws_iam_role_policy_attachment.AmazonEKS_CNI_Policy",
"aws_iam_role_policy_attachment.AmazonEC2ContainerRegistryReadOnly",
]
}
output "eks_cluster_endpoint" {
value = "${aws_eks_cluster.CLUSTER.endpoint}"
}
output "eks_cluster_certificat_authority" {
value = "${aws_eks_cluster.CLUSTER.certificate_authority}"
}
securityAndGroups.tf
resource "aws_iam_role" "eks_cluster_role" {
name = "eks-cluster-${var.SUFFIX}"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role" "eks_nodes_role" {
name = "eks-node-${var.SUFFIX}"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
POLICY
}
resource "aws_iam_role_policy_attachment" "AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = "${aws_iam_role.eks_cluster_role.name}"
}
resource "aws_iam_role_policy_attachment" "AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = "${aws_iam_role.eks_cluster_role.name}"
}
resource "aws_iam_role_policy_attachment" "AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = "${aws_iam_role.eks_nodes_role.name}"
}
resource "aws_iam_role_policy_attachment" "AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = "${aws_iam_role.eks_nodes_role.name}"
}
resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = "${aws_iam_role.eks_nodes_role.name}"
}
VPCAndRouting.tf用于创建我的路由、VPC 和子网
resource "aws_vpc" "unir_shop_vpc_dev" {
cidr_block = "${var.NET_CIDR_BLOCK}"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "UNIR-VPC-SHOP-${var.SUFFIX}"
Environment = "${var.SUFFIX}"
}
}
resource "aws_route_table" "route" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
route {
cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.unir_gat_shop_dev.id}"
}
tags = {
Name = "UNIR-RoutePublic-${var.SUFFIX}"
Environment = "${var.SUFFIX}"
}
}
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_subnet" "unir_subnet_aplications" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
cidr_block = "${var.SUBNET_CIDR_APLICATIONS}"
availability_zone = "${var.ZONE_SUB}"
depends_on = ["aws_internet_gateway.unir_gat_shop_dev"]
map_public_ip_on_launch = true
tags = {
Name = "UNIR-SUBNET-APLICATIONS-${var.SUFFIX}"
Environment = "${var.SUFFIX}"
}
}
resource "aws_subnet" "unir_subnet_cluster_1" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
cidr_block = "${var.SUBNET_CIDR_CLUSTER_1}"
map_public_ip_on_launch = true
availability_zone = "${var.ZONE_SUB_CLUSTER_2}"
tags = {
"kubernetes.io/cluster/UNIR-API-REST-CLUSTER-${var.SUFFIX}" = "shared"
}
}
resource "aws_subnet" "unir_subnet_cluster_2" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
cidr_block = "${var.SUBNET_CIDR_CLUSTER_2}"
availability_zone = "${var.ZONE_SUB_CLUSTER_1}"
map_public_ip_on_launch = true
tags = {
"kubernetes.io/cluster/UNIR-API-REST-CLUSTER-${var.SUFFIX}" = "shared"
}
}
resource "aws_internet_gateway" "unir_gat_shop_dev" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
tags = {
Environment = "${var.SUFFIX}"
Name = "UNIR-publicGateway-${var.SUFFIX}"
}
}
我的变量:
SUFFIX="DEV"
ZONE="eu-west-1"
TERRAFORM_USER_ID=
TERRAFORM_USER_PASS=
ZONE_SUB="eu-west-1b"
ZONE_SUB_CLUSTER_1="eu-west-1a"
ZONE_SUB_CLUSTER_2="eu-west-1c"
NET_CIDR_BLOCK="172.15.0.0/24"
SUBNET_CIDR_APLICATIONS="172.15.0.0/27"
SUBNET_CIDR_CLUSTER_1="172.15.0.32/27"
SUBNET_CIDR_CLUSTER_2="172.15.0.64/27"
SUBNET_CIDR_CLUSTER_3="172.15.0.128/27"
SUBNET_CIDR_CLUSTER_4="172.15.0.160/27"
SUBNET_CIDR_CLUSTER_5="172.15.0.192/27"
SUBNET_CIDR_CLUSTER_6="172.15.0.224/27"
MONGO_SSH_KEY=
KIBANA_SSH_KEY=
CLUSTER_SSH_KEY=
需要更多日志吗?
还有一种快速解决此类问题的方法。 您可以使用“AWSSupport-TroubleshootEKSWorkerNode”运行手册。 此 Runbook 旨在帮助对未能加入 EKS 集群的 EKS 工作程序节点进行故障排除。 您需要转到 AWS Systems Manager -> Automation -> 选择 Runbook -> 使用 ClusterName 和 Instance-id 执行 Runbook。
这对故障排除非常有帮助,并在执行结束时提供了一个很好的总结。
您也可以在此处参考文档。
根据 AWS 文档:
如果您在 AWS 管理控制台中收到错误“Instances failed to join the kubernetes cluster”,请确保已启用集群的私有终端节点访问,或者您已正确配置 CIDR 块以进行公共终端节点访问。 有关更多信息,请参阅 Amazon EKS 集群终端节点访问控制。
我注意到您正在切换子网的可用区:
resource "aws_subnet" "unir_subnet_cluster_1" {
vpc_id = "${aws_vpc.unir_shop_vpc_dev.id}"
cidr_block = "${var.SUBNET_CIDR_CLUSTER_1}"
map_public_ip_on_launch = true
availability_zone = "${var.ZONE_SUB_CLUSTER_2}"
您已将var.ZONE_SUB_CLUSTER_2
分配给unir_subnet_cluster_1
并将var.ZONE_SUB_CLUSTER_1
分配给unir_subnet_cluster_2
。 也许这可能是配置错误的原因。
如“NodeCreationFailure”下所述, 此错误有两个可能的原因:
NodeCreationFailure :您启动的实例无法注册到您的 Amazon EKS 集群。 此故障的常见原因是节点 IAM 角色权限不足或节点缺少出站 Internet 访问权限。
您的节点必须能够使用公共 IP 地址正确访问互联网到 function。
在我的情况下,集群位于私有子网内,在添加到 NAT 网关的路由后,错误消失了。
我遇到了同样的问题,通过创建另一个 nat 网关并选择公共子网并将新的 nat 网关以可路由方式连接到我的私有子网来解决。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.