简体   繁体   English

私有子网中的 EC2 实例无法访问亚马逊存储库

[英]EC2 instances in private subnets cannot access amazon repository

I am trying to create ECS cluster and I have manually built VPC with 3 public and 3 private subnets.我正在尝试创建 ECS 集群,并且我已经手动构建了具有 3 个公共子网和 3 个私有子网的 VPC。 All 3 public subnets have IGW attached to them with 0.0.0.0/0 and all 3 private subnets have NAT Gateways attached in route tables with 0.0.0.0/0.所有 3 个公共子网的 IGW 都连接到 0.0.0.0/0,所有 3 个私有子网都连接到路由表中的 NAT 网关,地址为 0.0.0.0/0。 Each of 3 NAT Gateways are in each public subnet respectively. 3 个 NAT 网关分别位于每个公共子网中。

I have already created another ECS Cluster with the same CloudFormation template that I am trying to use now and everything worked fine.我已经使用我现在尝试使用的相同 CloudFormation 模板创建了另一个 ECS 集群,并且一切正常。

I have compared settings between 1st and 2nd VPC (failing one) and all settings (IGW, NAT Gateway, Route Tables, NACLs, SG) are same of course IPs are adjusted to the IP of 2nd VPC.我比较了第 1 个和第 2 个 VPC(失败的一个)之间的设置,并且所有设置(IGW、NAT 网关、路由表、NACL、SG)都相同,当然 IP 被调整为第二个 VPC 的 IP。 When I try to create ECS in 2nd VPC (failing one) EC2 instances in private subnets fail to connect to Amazon repository and subsequently the whole stack is rolled back because the signal from EC2 instances is never being sent to Auto Scaling Group.当我尝试在第二个 VPC(失败的一个)中创建 ECS 时,私有子网中的 EC2 实例无法连接到 Amazon 存储库,随后整个堆栈都回滚,因为来自 EC2 实例的信号从未发送到 Auto Scaling 组。

Afterwards I have checked the system logs from EC2 instances and they are not able to install amazon agent.之后我检查了 EC2 实例的系统日志,但他们无法安装亚马逊代理。 Here is excerpt from logs:以下是日志摘录:

Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Mon, 20 Aug 2018 06:38:04 +0000. Up 10.06 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Package upgrade failed
Aug 20 06:38:20 cloud-init[2116]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
Generating SSH2 ED25519 host key: [  OK  ]

Starting sshd: [  OK  ]

ntpdate: Synchronizing with time server: [  OK  ]

Starting ntpd: [  OK  ]

Starting sendmail: [  OK  ]

Starting sm-client: [  OK  ]

Starting crond: [  OK  ]

Starting cgconfig service: [  OK  ]

Starting docker:    .[  OK  ]

Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Mon, 20 Aug 2018 06:38:25 +0000. Up 29.91 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
Examining /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm: amazon-ssm-agent-2.3.13.0-1.x86_64
Marking /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm to be installed
Resolving Dependencies


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper
[   53.291581] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   53.297948] Bridge firewalling registered
[   53.304776] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[   53.318481] ip_tables: (C) 2000-2006 Netfilter Core Team
[   53.510300] Initializing XFRM netlink socket
[   53.515251] Netfilter messages via NETLINK v0.30.
[   53.518920] ctnetlink v0.93: registering with nfnetlink.
[   53.688086] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper


 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Disable the repository, so yum won't use it by default. Yum will then
        just ignore the repository until you permanently enable it again or use
        --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>

     4. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
/var/lib/cloud/instance/scripts/part-001: line 9: /opt/aws/bin/cfn-init: No such file or directory
/var/lib/cloud/instance/scripts/part-001: line 10: /opt/aws/bin/cfn-signal: No such file or directory
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Aug 20 06:39:13 cloud-init[2286]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed

I have checked NACL, for Inbound and Outbound everything is set to 0.0.0.0/0 and ALLOW.我已经检查过 NACL,对于入站和出站,一切都设置为 0.0.0.0/0 和 ALLOW。

For the 1st VPC I am using ECS optimized AMI and t2.large (no issues whatsoever) and for 2nd c5.xlarge (causing issues).对于第一个 VPC,我使用 ECS 优化的 AMI 和t2.large (没有任何问题)和第二个c5.xlarge (导致问题)。

What could be still causing EC2 to being unable to reach Amazon repository?什么可能仍然导致 EC2 无法访问 Amazon 存储库?

Edit编辑

So later on I found out 2nd VPC has S3 Endpoint attached to it.所以后来我发现第二个 VPC 附加了 S3 端点。 After a little bit more research I found a nice post on LinkedIn stating:经过更多的研究,我在 LinkedIn 上发现了一篇很好的帖子,说明:

The Amazon Linux repositories are hosted on S3 and because of this it's necessary to allow access to it in the S3 endpoint policy. Amazon Linux 存储库托管在 S3 上,因此有必要允许在 S3 终端节点策略中访问它。

So when you fire up yum it uses the magic of local DNS trickery to route to the internal S3 endpoint因此,当您启动 yum 时,它会使用本地 DNS 诡计的魔法来路由到内部 S3 端点

I went on to update my CloudFormation template and added additional policy to the LaunchConfiguration below, but that did not help:我继续更新我的 CloudFormation 模板并向下面的 LaunchConfiguration 添加了额外的策略,但这并没有帮助:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::repo.eu-central-1.amazonaws.com",
                "arn:aws:s3:::repo.eu-central-1.amazonaws.com/*"
            ],
            "Effect": "Allow"
        }
    ]
}

And Endpoint Policy looks like this:端点策略如下所示:

{
    "Statement": [
        {
            "Action": "*",
            "Effect": "Allow",
            "Resource": "*",
            "Principal": "*"
        }
    ]
}

So finally after exploring all the sections of AWS console, I have found out what was causing the issue.所以最后在探索了 AWS 控制台的所有部分后,我找到了导致问题的原因。 As already stated in my update of the original post, when Endpoint is attached to VPC the EC2 will try to resolve packages and repositories internally.正如我在对原始帖子的更新中所述,当 Endpoint 附加到 VPC 时,EC2 将尝试在内部解析包和存储库。 I went and checked every setting of Endpoint and found out that only route tables of Public Subnets where added to the Endpoint and after I have added Private subnets as well, the EC2 instances could reach the packages and repositories.我去检查了端点的每个设置,发现只有公共子网的路由表添加到端点,并且在我添加了私有子网之后,EC2 实例可以访问包和存储库。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM