简体   繁体   中英

AWS System Manager start session: An error occurred (TargetNotConnected) when calling the StartSession operation: <instance_id> is not connected

Problem:

When I try to locally connect to a running EC2 instance using the AWS System Session Manager CLI command: aws ssm start-session --target i-123456

I get the error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

Background:

  • Linux 2 instance hosted on a private su.net within a custom VPC
  • VPC endpoints used to connect System Manager to managed instances without the need for a NAT GW or IGW.
  • Endpoint Service Names:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages
  • AWS CLI == 2.0.40
  • Python == 3.7.4
  • Custom Terraform module to launch airflow instance within one of the private su.nets (see module "airflow_aws_resources" below)
  • The only.tf file that would be relevant to this problem would be airflow.tf within the module "airflow_aws_resources". This file contains the security group and instance profile configuration for the EC2 instance that is being connected via SSM.

Reproduce with Terraform:

module "airflow_aws_resources" {
  source                      = "github.com/marshall7m/tf_modules/airflow-aws-resources"
  resource_prefix             = "test"
  vpc_id                      = module.vpc.vpc_id
  env                         = "testing"
  private_bucket              = "test-bucket"
  private_subnets_ids         = module.vpc.private_subnets
  private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks

  create_airflow_instance     = true
  create_airflow_instance_sg  = true
  create_airflow_db           = false
  create_airflow_db_sg        = false
  airflow_instance_ssm_access = true
  airflow_instance_ssm_region = "us-west-2"

  airflow_instance_ami  = "ami-0841edc20334f9287"
  airflow_instance_type = "t2.micro"

}

resource "aws_security_group" "vpc_endpoints" {
  name        = "test-vpc-endpoint-sg"
  description = "Default security group for vpc endpoints"
  vpc_id = module.vpc.vpc_id
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    #private subnet cidr blocks
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }

  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }
  egress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }
}

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  version = "2.44.0"
  name = "test-vpc" 
  cidr = "10.0.0.0/24"

  azs = ["us-west-2a", "us-west-2b"]
  
  private_subnets = ["10.0.0.32/28", "10.0.0.64/28"]
  private_dedicated_network_acl = true
  private_subnet_suffix = "private"

  public_subnets = ["10.0.0.96/28", "10.0.0.128/28"]
  public_dedicated_network_acl = true
  public_subnet_suffix = "public"

  enable_s3_endpoint = true

  enable_ec2messages_endpoint = true
  ec2messages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
  enable_ec2_endpoint = true
  ec2_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]

  enable_ssm_endpoint = true
  ssm_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
  enable_ssmmessages_endpoint = true
  ssmmessages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]

  enable_nat_gateway = false
  single_nat_gateway = false
  enable_vpn_gateway = false

  create_database_subnet_route_table = false
  create_database_internet_gateway_route = false
  create_database_subnet_group = false
   
  manage_default_network_acl = false 
  enable_dns_hostnames = true
  enable_dns_support = true
  
  private_inbound_acl_rules = [
    {
      "description": "Allows inbound https traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 443,
      "to_port": 443,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 101
    },
    { 
      "description": "Allows inbound http traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 80,
      "to_port": 80,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 102
    }
  ]
  private_outbound_acl_rules = [
    {
      "description": "Allows outbound https traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 443,
      "to_port": 443,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 101
    },
    { 
      "description": "Allows outbound http traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 80,
      "to_port": 80,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 102
    }
  ]
  
  vpc_endpoint_tags = {
    type = "vpc-endpoint"
  }
}

Attempts:

#1

I tried the trouble shooting tips within the EC2 Console SSM (AWS Ec2 console >> instance-id >> Connect >> Session Manager):

控制台ssm

  1. SSM agent is already pre-installed on AWS Linux instance types. Although I doubled checked by accessing the instance via SSH and running sudo status amazon-ssm-agent which returned: amazon-ssm-agent start/running, process 1234

  2. The EC2 instance profile displayed above includes the required AmazonSSMManagedInstanceCore policy

  3. I completed the Session Manager Prerequisite.

#2

Attaching AmazonSSMFullAccess to the user using the command: aws ssm start-session --target i-123456

Same error while connecting the instance via SSM:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#3

Adding HTTPS inbound/outbound traffic from the VPC endpoint's asscoiated private su.net to the EC2 instance security group (see airflow.tf )

Same error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#4

Within the System Manager console I used the Quick Setup option and configured the Quick Setup with the Instance profile specified in airflow.tf and the System Manager role with the default role. The ec2 instance successfully registered "Managed instances" within the quick setup page.

Same error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#5

Given this is a test VPC and EC2 instance, I tried allowing all types of traffic from all IPv4 sources (0.0.0.0/0) for the following resources:

  • Private su.nets NACL
  • EC2 instance security group
  • The security group associated with the following interface/gateway endpoints:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages

Same error while connecting the instance via SSM:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

I would refer here to make sure you have everything set up properly. I would first add the profile argument. If that still doesn't work, I ran into a similar issue when my profiles default region was not the same region I was looking to begin an active session. Thus, I needed to use the region argument as well. Sample .ssh/config below:

host ssh i-abc123
ProxyCommand sh -c "aws --region desired_region --profile my_profile ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"

I would also encourage using AWS CLI v2. Once you configure your .ssh/config to look like that above, simply execute the following in a CLI:

ssh i-abc123

So you might need to use a profile. I am using AWS CLI on OSX to connect via the terminal into a linux host in a VPC. This is an account only accessible via SSO. I was able to create a profile and after authenticating via the CLI to SSO I can establish a connection like this.

Do this once

aws sso login --profile my_customer  

Then verify the sso login was successful with a trivial command (on my osx terminal)

 aws s3 ls --profile my_customer  custbucket-s3-sftp/rds/

now establish session manager connection

 aws ssm start-session --profile my_customer  --target i-0012345abcdef890

I know you are using python but maybe this helps.

In some cases, you've to verify the following:

  • AWS Account/Profile
  • AWS region

In one case, I found that it was trying to connect to aws profile.

Later in other case I was connecting to a different region.

In my case, I had to wait like 10 minutes after I attached an IAM Role to EC2 using AWS Console/UI

I was also getting the same error when I tried to connect from my Terminal: An error occurred (TargetNotConnected) when calling the StartSession operation: i-122334455 is not connected.

In my case, the issue was that the SSM installed on the target instance was out of date. I discovered this by trying to start the session from Systems Manager in the AWS console; basically going to Systems Manager->Fleet Manager->{INSTANCE_ID}->Instance Actions->Start Session. When I tried that, I got the error message that the SSM agent on the target ec2 instance was out of date. After updating, I was able to login successfully.

To update, you can either enable SSM agent auto-update for all managed instances, update the particular instance manually, or do selective update of the managed instances. See the following documentation for info:

I ran into similar issue. If you attempt to start a session on a managed node that is located in a different AWS account or AWS Region then you will see similar kind of error. For example, my aws instance is located us-east-2 region and my aws default profile is set in us-east-1 so when I ran "aws ssm start-session instance_number" it returned error as the ssm does not know about the instances that are in us-east-2 region. To fix the problem, I over ridded region filter name and ran "aws ssm start-session instance_number --region us-east-2". I was able to connect to the instance with no issues..

Explaination : Unfortunately ec2 instances are not fault tolerant and under your system server is a host system. As a best practice you should add another instances to backup and prevent single point of failure.

One of the possible reasons when you try to ssm/ssh your host and get TargetNotConnected issue can happen from several reasons: if a host hardware fails, connectivity/electricity issues, software memory leak ( running out of memory ), full disk that are not cleaned up or your application can handle edge cases and crashing itself.

Under parts of this cases ec2 instance state might still be running though the reachability fails.

When you run aws ec2 describe-instance-status --instance-ids <instance-id> you might notice that the instance state is running though the health check fails.

Example :

request: aws ec2 describe-instance-status --instance-ids i-abc123

response:

{
    "InstanceStatuses": [
        {
            "AvailabilityZone": "us-west-1b",
            "InstanceId": "i-abc123",
            "InstanceState": {
                "Code": 16,
                "Name": "running"
            },
            "InstanceStatus": {
                "Details": [
                    {
                        "ImpairedSince": "2020-10-10T12:10:00+00:00",
                        "Name": "reachability",
                        "Status": "failed"
                    }
                ],
                "Status": "impaired"
            },
            "SystemStatus": {
                "Details": [
                    {
                        "Name": "reachability",
                        "Status": "passed"
                    }
                ],
                "Status": "ok"
            }
        }
    ]
}

Solution would be recreating this instance again if it's an hardware issue ( in iaac platforms such as terraform / clodformation or manually ofcourse ) if it's applicative issue connect into machine and solve the exact problem.

Do your Interface type VPC endpoints have private DNS enabled?

Session Manager appears to need private_dns_enabled = true in Terraform VPC endpoints of Interface type in order to work.

I ran into this after making some changes with terraform that modified the EC2 instance in place. Turns out that all I needed to do was reboot the EC2, and then it allowed me to connect again

private ec2 instance require access Inte.net(ssm point)

Therefore, it needs to be implemented with NAT gateway.

Yet another possible gotcha:

I confused the security group on the VPC endpoints with the security group that was attached to my EC2 instance . At first I interpreted it as giving that security group (and instance) access to my VPC endpoints.

Instead, I needed to create a new security group that specifies the inbound/outbound traffic allowed on my VPC endpoints. From the AWS docs :

The security group attached to the VPC endpoint must allow incoming connections on port 443 from the private su.net of the managed instance. If incoming connections aren't allowed, then the managed instance can't connect to the SSM and EC2 endpoints.

So I added rules to allow all HTTPS traffic in/out of the VPC endpoint.

This is separate from the security group on the ec2 instance, for which I allowed all outbound traffic and no inbound traffic.

As soon as I added the new VPC security group to each of my VPC endpoints, the instance appeared in session manager as being connected and ready to start sessions.

Problem:

When I try to locally connect to a running EC2 instance using the AWS System Session Manager CLI command: aws ssm start-session --target i-123456

I get the error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

Background:

  • Linux 2 instance hosted on a private subnet within a custom VPC
  • VPC endpoints used to connect System Manager to managed instances without the need for a NAT GW or IGW.
  • Endpoint Service Names:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages
  • AWS CLI == 2.0.40
  • Python == 3.7.4
  • Custom Terraform module to launch airflow instance within one of the private subnets (see module "airflow_aws_resources" below)
  • The only .tf file that would be relevant to this problem would be airflow.tf within the module "airflow_aws_resources". This file contains the security group and instance profile configuration for the EC2 instance that is being connected via SSM.

Reproduce with Terraform:

module "airflow_aws_resources" {
  source                      = "github.com/marshall7m/tf_modules/airflow-aws-resources"
  resource_prefix             = "test"
  vpc_id                      = module.vpc.vpc_id
  env                         = "testing"
  private_bucket              = "test-bucket"
  private_subnets_ids         = module.vpc.private_subnets
  private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks

  create_airflow_instance     = true
  create_airflow_instance_sg  = true
  create_airflow_db           = false
  create_airflow_db_sg        = false
  airflow_instance_ssm_access = true
  airflow_instance_ssm_region = "us-west-2"

  airflow_instance_ami  = "ami-0841edc20334f9287"
  airflow_instance_type = "t2.micro"

}

resource "aws_security_group" "vpc_endpoints" {
  name        = "test-vpc-endpoint-sg"
  description = "Default security group for vpc endpoints"
  vpc_id = module.vpc.vpc_id
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    #private subnet cidr blocks
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }

  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }
  egress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
  }
}

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  version = "2.44.0"
  name = "test-vpc" 
  cidr = "10.0.0.0/24"

  azs = ["us-west-2a", "us-west-2b"]
  
  private_subnets = ["10.0.0.32/28", "10.0.0.64/28"]
  private_dedicated_network_acl = true
  private_subnet_suffix = "private"

  public_subnets = ["10.0.0.96/28", "10.0.0.128/28"]
  public_dedicated_network_acl = true
  public_subnet_suffix = "public"

  enable_s3_endpoint = true

  enable_ec2messages_endpoint = true
  ec2messages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
  enable_ec2_endpoint = true
  ec2_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]

  enable_ssm_endpoint = true
  ssm_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
  enable_ssmmessages_endpoint = true
  ssmmessages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]

  enable_nat_gateway = false
  single_nat_gateway = false
  enable_vpn_gateway = false

  create_database_subnet_route_table = false
  create_database_internet_gateway_route = false
  create_database_subnet_group = false
   
  manage_default_network_acl = false 
  enable_dns_hostnames = true
  enable_dns_support = true
  
  private_inbound_acl_rules = [
    {
      "description": "Allows inbound https traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 443,
      "to_port": 443,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 101
    },
    { 
      "description": "Allows inbound http traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 80,
      "to_port": 80,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 102
    }
  ]
  private_outbound_acl_rules = [
    {
      "description": "Allows outbound https traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 443,
      "to_port": 443,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 101
    },
    { 
      "description": "Allows outbound http traffic for aws s3 package requests"
      "cidr_block": "0.0.0.0/0",
      "from_port": 80,
      "to_port": 80,
      "protocol": "tcp",
      "rule_action": "allow",
      "rule_number": 102
    }
  ]
  
  vpc_endpoint_tags = {
    type = "vpc-endpoint"
  }
}

Attempts:

#1

I tried the trouble shooting tips within the EC2 Console SSM (AWS Ec2 console >> instance-id >> Connect >> Session Manager):

控制台-ssm

  1. SSM agent is already pre-installed on AWS Linux instance types. Although I doubled checked by accessing the instance via SSH and running sudo status amazon-ssm-agent which returned: amazon-ssm-agent start/running, process 1234

  2. The EC2 instance profile displayed above includes the required AmazonSSMManagedInstanceCore policy

  3. I completed the Session Manager Prerequisite.

#2

Attaching AmazonSSMFullAccess to the user using the command: aws ssm start-session --target i-123456

Same error while connecting the instance via SSM:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#3

Adding HTTPS inbound/outbound traffic from the VPC endpoint's asscoiated private subnet to the EC2 instance security group (see airflow.tf )

Same error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#4

Within the System Manager console I used the Quick Setup option and configured the Quick Setup with the Instance profile specified in airflow.tf and the System Manager role with the default role. The ec2 instance successfully registered "Managed instances" within the quick setup page.

Same error:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

#5

Given this is a test VPC and EC2 instance, I tried allowing all types of traffic from all IPv4 sources (0.0.0.0/0) for the following resources:

  • Private subnets NACL
  • EC2 instance security group
  • The security group associated with the following interface/gateway endpoints:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages

Same error while connecting the instance via SSM:

An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM