Why can't my ECS service register available EC2 instances with my ELB?

Question

I've got an EC2 launch configuration that builds the ECS optimized AMI. I've got an auto scaling group that ensures that I've got at least two available instances at all times. Finally, I've got a load balancer.

I'm trying to create an ECS service that distributes my tasks across the instances in the load balancer.

After reading the documentation for ECS load balancing, it's my understanding that my ASG should not automatically register my EC2 instances with the ELB, because ECS takes care of that. So, my ASG does not specify an ELB. Likewise, my ELB does not have any registered EC2 instances.

When I create my ECS service, I choose the ELB and also select the ecsServiceRole. After creating the service, I never see any instances available in the ECS Instances tab. The service also fails to start any tasks, with a very generic error of ...

service was unable to place a task because the resources could not be found.

I've been at this for about two days now and can't seem to figure out what configuration settings are not properly configured. Does anybody have any ideas as to what might be causing this to not work?

Update @ 06/25/2015:

I think this may have something to do with the ECS_CLUSTER user data setting.

In my EC2 auto scaling launch configuration, if I leave the user data input completely empty, the instances are created with an ECS_CLUSTER value of "default". When this happens, I see an automatically-created cluster, named "default". In this default cluster, I see the instances and can register tasks with the ELB like expected. My ELB health check (HTTP) passes once the tasks are registered with the ELB and all is good in the world.

But, if I change that ECS_CLUSTER setting to something custom I never see a cluster created with that name. If I manually create a cluster with that name, the instances never become visible within the cluster. I can't ever register tasks with the ELB in this scenario.

Any ideas?

Answer 1

In the end, it ended up being that my EC2 instances were not being assigned public IP addresses. It appears ECS needs to be able to directly communicate with each EC2 instance, which would require each instance to have a public IP. I was not assigning my container instances public IP addresses because I thought I'd have them all behind a public load balancer, and each container instance would be private.

Answer 2

I had similar symptoms but ended up finding the answer in the log files:

/var/log/ecs/ecs-agent.2016-04-06-03:

2016-04-06T03:05:26Z [ERROR] Error registering: AccessDeniedException: User: arn:aws:sts::<removed>:assumed-role/<removed>/<removed is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-west-2:<removed:cluster/MyCluster-PROD
    status code: 400, request id: <removed>

In my case, the resource existed but was not accessible. It sounds like OP is pointing at a resource that doesn't exist or isn't visible. Are your clusters and instances in the same region? The logs should confirm the details.

In response to other posts:

You do NOT need public IP addresses.

You do need: the ecsServiceRole or equivalent IAM role assigned to the EC2 instance in order to talk to the ECS service. You must also specify the ECS cluster and can be done via user data during instance launch or launch configuration definition, like so:

#!/bin/bash
echo ECS_CLUSTER=GenericSericeECSClusterPROD >> /etc/ecs/ecs.config

If you fail to do this on newly launched instances, you can do this after the instance has launched and then restart the service.

Answer 3

Another problem that might arise is not assigning a role with the proper policy to the Launch Configuration. My role didn't have the AmazonEC2ContainerServiceforEC2Role policy (or the permissions that it contains) as specified here .

Answer 4

It might also be that the ECS agent creates a file in /var/lib/ecs/data that stores the cluster name.

If the agent first starts up with the cluster name of 'default', you'll need to delete this file and then restart the agent.

Answer 5

You definitely do not need public IP addresses for each of your private instances. The correct (and safest) way to do this is setup a NAT Gateway and attach that gateway to the routing table that is attached to your private subnet.

This is documented in detail in the VPC documentation, specifically Scenario 2: VPC with Public and Private Subnets (NAT) .

Answer 6

There where several layers of problems in our case. I will list them out so it might give you some idea of the issues to pursue.

My gaol was to have 1 ECS in 1 host. But ECS forces you to have 2 subnets under your VPC and each have 1 instance of docker host. I was trying to just have 1 docker host in 1 availability zone and could not get it to work.

Then the other issue was that the only one of the subnets had an attached internet facing gateway to it. So one of them was not accessible from public.

The end result was DNS was serving 2 IPs for my ELB. And one of the IPs would work and the other did not. So I was seeing random 404s when accessing the NLB using the public DNS.

Why can't my ECS service register available EC2 instances with my ELB?

Question

6 answers

solution1
13 ACCPTED 2015-06-29 01:37:01

solution2
9 2016-04-06 03:17:15

solution3
2 2015-10-08 12:24:48

solution4
2 2016-01-27 22:09:16

solution5
2 2017-06-21 00:02:25

solution6
-1 2016-10-07 06:57:09

Why can't my ECS service register available EC2 instances with my ELB?

Question

6 answers

solution1 13 ACCPTED 2015-06-29 01:37:01

solution2 9 2016-04-06 03:17:15

solution3 2 2015-10-08 12:24:48

solution4 2 2016-01-27 22:09:16

solution5 2 2017-06-21 00:02:25

solution6 -1 2016-10-07 06:57:09

solution1
13 ACCPTED 2015-06-29 01:37:01

solution2
9 2016-04-06 03:17:15

solution3
2 2015-10-08 12:24:48

solution4
2 2016-01-27 22:09:16

solution5
2 2017-06-21 00:02:25

solution6
-1 2016-10-07 06:57:09