简体   繁体   中英

AWS Kinesis Firehose unable to index data into AWS Elasticsearch

I am trying to send data from Amazon Kinesis Data Firehose to Amazon Elasticsearch Service , but it's logging an error saying 503 Service Unavailable . However, I can reach the Elasticsearch endpoint ( https://vpc-XXX.<region>.es.amazonaws.com ) and make queries on it. I also went through How can I prevent HTTP 503 Service Unavailable errors in Amazon Elasticsearch Service? and can confirm my setup have enough resources.

Here's the error I get in my S3 backup bucket that holds the failed logs:

{
    "attemptsMade": 8,
    "arrivalTimestamp": 1599748282943,
    "errorCode": "ES.ServiceException",
    "errorMessage": "Error received from Elasticsearch cluster. <html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>",
    "attemptEndingTimestamp": 1599748643460,
    "rawData": "eyJ0aWNrZXJfc3ltYm9sIjoiQUxZIiwic2VjdG9yIjoiRU5FUkdZIiwiY2hhbmdlIjotNi4zNSwicHJpY2UiOjg4LjgzfQ==",
    "subsequenceNumber": 0,
    "esDocumentId": "49610662085822146490768158474738345331794592496281976834.0",
    "esIndexName": "prometheus-2020-09",
    "esTypeName": ""
},

Anyone have any ideas how to fix this and have the data indexed into Elasticsearch?

Turns out, my issue was with selecting the wrong security group.


I was using the same security group (I named it elasticsearch-${domain_name} ) as attached to the Elasticsearch instance (which allowed TCP ingress/egress to/from port 443 from the firehose_es security group). I should have selected the firehose_es security group instead.

As requested in the comment, here's the Terraform configuration for the firehose_es SG.

resource "aws_security_group" "firehose_es" {
  name        = "firehose_es"
  description = "Firehose to send logs to Elasticsearch"
  vpc_id      = module.networking.aws_vpc_id
}

resource "aws_security_group_rule" "firehose_es_https_ingress" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = aws_security_group.firehose_es.id
  cidr_blocks       = ["10.0.0.0/8"]
}

resource "aws_security_group_rule" "firehose_es_https_egress" {
  type                     = "egress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.firehose_es.id
  source_security_group_id = aws_security_group.elasticsearch.id
}

Another thing which I fixed prior to asking this question (which may be why some of you are reaching this question) is to use the right role and attach the right policy to the role. Here's my role (as Terraform config)

// https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html
data "aws_iam_policy_document" "firehose_es_policy_specific" {
  statement {
    actions = [
      "s3:AbortMultipartUpload",
      "s3:GetBucketLocation",
      "s3:GetObject",
      "s3:ListBucket",
      "s3:ListBucketMultipartUploads",
      "s3:PutObject"
    ]
    resources = [
      aws_s3_bucket.firehose.arn,
      "${aws_s3_bucket.firehose.arn}/*"
    ]
  }

  statement {
    actions = [
      "es:DescribeElasticsearchDomain",
      "es:DescribeElasticsearchDomains",
      "es:DescribeElasticsearchDomainConfig",
      "es:ESHttpPost",
      "es:ESHttpPut"
    ]

    resources = [
      var.elasticsearch_domain_arn,
      "${var.elasticsearch_domain_arn}/*",
    ]
  }

  statement {
    actions = [
      "es:ESHttpGet"
    ]

    resources = [
      "${var.elasticsearch_domain_arn}/_all/_settings",
      "${var.elasticsearch_domain_arn}/_cluster/stats",
      "${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_mapping/type-name",
      "${var.elasticsearch_domain_arn}/_nodes",
      "${var.elasticsearch_domain_arn}/_nodes/stats",
      "${var.elasticsearch_domain_arn}/_nodes/*/stats",
      "${var.elasticsearch_domain_arn}/_stats",
      "${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_stats"
    ]
  }

  statement {
    actions = [
      "ec2:DescribeVpcs",
      "ec2:DescribeVpcAttribute",
      "ec2:DescribeSubnets",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeNetworkInterfaces",
      "ec2:CreateNetworkInterface",
      "ec2:CreateNetworkInterfacePermission",
      "ec2:DeleteNetworkInterface",
    ]

    resources = [
      "*"
    ]
  }
}

resource "aws_kinesis_firehose_delivery_stream" "ecs" {
  name        = "${var.name_prefix}${var.name}_${var.app}"
  destination = "elasticsearch"

  s3_configuration {
    role_arn           = aws_iam_role.firehose_es.arn
    bucket_arn         = aws_s3_bucket.firehose.arn
    buffer_interval    = 60
    compression_format = "GZIP"
  }

  elasticsearch_configuration {
    domain_arn = var.elasticsearch_domain_arn
    role_arn   = aws_iam_role.firehose_es.arn

    # If Firehose cannot deliver to Elasticsearch, logs are sent to S3
    s3_backup_mode = "FailedDocumentsOnly"

    buffering_interval = 60
    buffering_size     = 5

    index_name            = "${var.name_prefix}${var.name}_${var.app}"
    index_rotation_period = "OneMonth"

    vpc_config {
      subnet_ids         = var.elasticsearch_subnet_ids
      security_group_ids = [var.firehose_security_group_id]
      role_arn           = aws_iam_role.firehose_es.arn
    }
  }
}

I was able to figure our my mistake after reading through the Controlling Access with Amazon Kinesis Data Firehose article again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM