简体   繁体   English

AWS Kinesis Firehose 无法将数据索引到 AWS Elasticsearch

[英]AWS Kinesis Firehose unable to index data into AWS Elasticsearch

I am trying to send data from Amazon Kinesis Data Firehose to Amazon Elasticsearch Service , but it's logging an error saying 503 Service Unavailable .我正在尝试将数据从Amazon Kinesis Data Firehose发送到Amazon Elasticsearch Service ,但它记录了一个错误,指出503 Service Unavailable However, I can reach the Elasticsearch endpoint ( https://vpc-XXX.<region>.es.amazonaws.com ) and make queries on it.但是,我可以访问 Elasticsearch 端点 ( https://vpc-XXX.<region>.es.amazonaws.com ) 并对其进行查询。 I also went through How can I prevent HTTP 503 Service Unavailable errors in Amazon Elasticsearch Service?我还了解了如何防止 Amazon Elasticsearch Service 中的 HTTP 503 服务不可用错误? and can confirm my setup have enough resources.并且可以确认我的设置有足够的资源。

Here's the error I get in my S3 backup bucket that holds the failed logs:这是我在保存失败日志的 S3 备份存储桶中遇到的错误:

{
    "attemptsMade": 8,
    "arrivalTimestamp": 1599748282943,
    "errorCode": "ES.ServiceException",
    "errorMessage": "Error received from Elasticsearch cluster. <html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>",
    "attemptEndingTimestamp": 1599748643460,
    "rawData": "eyJ0aWNrZXJfc3ltYm9sIjoiQUxZIiwic2VjdG9yIjoiRU5FUkdZIiwiY2hhbmdlIjotNi4zNSwicHJpY2UiOjg4LjgzfQ==",
    "subsequenceNumber": 0,
    "esDocumentId": "49610662085822146490768158474738345331794592496281976834.0",
    "esIndexName": "prometheus-2020-09",
    "esTypeName": ""
},

Anyone have any ideas how to fix this and have the data indexed into Elasticsearch?任何人都知道如何解决这个问题并将数据索引到 Elasticsearch 中?

Turns out, my issue was with selecting the wrong security group.原来,我的问题是选择了错误的安全组。


I was using the same security group (I named it elasticsearch-${domain_name} ) as attached to the Elasticsearch instance (which allowed TCP ingress/egress to/from port 443 from the firehose_es security group).我使用了与 Elasticsearch 实例相同的安全组(我将其命名为elasticsearch-${domain_name} )(它允许来自firehose_es安全组的 TCP 进/出端口 443)。 I should have selected the firehose_es security group instead.我应该选择firehose_es安全组。

As requested in the comment, here's the Terraform configuration for the firehose_es SG.根据评论中的要求,这是firehose_es SG 的 Terraform 配置。

resource "aws_security_group" "firehose_es" {
  name        = "firehose_es"
  description = "Firehose to send logs to Elasticsearch"
  vpc_id      = module.networking.aws_vpc_id
}

resource "aws_security_group_rule" "firehose_es_https_ingress" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = aws_security_group.firehose_es.id
  cidr_blocks       = ["10.0.0.0/8"]
}

resource "aws_security_group_rule" "firehose_es_https_egress" {
  type                     = "egress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.firehose_es.id
  source_security_group_id = aws_security_group.elasticsearch.id
}

Another thing which I fixed prior to asking this question (which may be why some of you are reaching this question) is to use the right role and attach the right policy to the role.在问这个问题之前我解决的另一件事(这可能是你们中的一些人提出这个问题的原因)是使用正确的角色并将正确的策略附加到角色。 Here's my role (as Terraform config)这是我的角色(作为 Terraform 配置)

// https://docs.aws.amazon.com/firehose/latest/dev/controlling-access.html
data "aws_iam_policy_document" "firehose_es_policy_specific" {
  statement {
    actions = [
      "s3:AbortMultipartUpload",
      "s3:GetBucketLocation",
      "s3:GetObject",
      "s3:ListBucket",
      "s3:ListBucketMultipartUploads",
      "s3:PutObject"
    ]
    resources = [
      aws_s3_bucket.firehose.arn,
      "${aws_s3_bucket.firehose.arn}/*"
    ]
  }

  statement {
    actions = [
      "es:DescribeElasticsearchDomain",
      "es:DescribeElasticsearchDomains",
      "es:DescribeElasticsearchDomainConfig",
      "es:ESHttpPost",
      "es:ESHttpPut"
    ]

    resources = [
      var.elasticsearch_domain_arn,
      "${var.elasticsearch_domain_arn}/*",
    ]
  }

  statement {
    actions = [
      "es:ESHttpGet"
    ]

    resources = [
      "${var.elasticsearch_domain_arn}/_all/_settings",
      "${var.elasticsearch_domain_arn}/_cluster/stats",
      "${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_mapping/type-name",
      "${var.elasticsearch_domain_arn}/_nodes",
      "${var.elasticsearch_domain_arn}/_nodes/stats",
      "${var.elasticsearch_domain_arn}/_nodes/*/stats",
      "${var.elasticsearch_domain_arn}/_stats",
      "${var.elasticsearch_domain_arn}/${var.name_prefix}${var.name}_${var.app}*/_stats"
    ]
  }

  statement {
    actions = [
      "ec2:DescribeVpcs",
      "ec2:DescribeVpcAttribute",
      "ec2:DescribeSubnets",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeNetworkInterfaces",
      "ec2:CreateNetworkInterface",
      "ec2:CreateNetworkInterfacePermission",
      "ec2:DeleteNetworkInterface",
    ]

    resources = [
      "*"
    ]
  }
}

resource "aws_kinesis_firehose_delivery_stream" "ecs" {
  name        = "${var.name_prefix}${var.name}_${var.app}"
  destination = "elasticsearch"

  s3_configuration {
    role_arn           = aws_iam_role.firehose_es.arn
    bucket_arn         = aws_s3_bucket.firehose.arn
    buffer_interval    = 60
    compression_format = "GZIP"
  }

  elasticsearch_configuration {
    domain_arn = var.elasticsearch_domain_arn
    role_arn   = aws_iam_role.firehose_es.arn

    # If Firehose cannot deliver to Elasticsearch, logs are sent to S3
    s3_backup_mode = "FailedDocumentsOnly"

    buffering_interval = 60
    buffering_size     = 5

    index_name            = "${var.name_prefix}${var.name}_${var.app}"
    index_rotation_period = "OneMonth"

    vpc_config {
      subnet_ids         = var.elasticsearch_subnet_ids
      security_group_ids = [var.firehose_security_group_id]
      role_arn           = aws_iam_role.firehose_es.arn
    }
  }
}

I was able to figure our my mistake after reading through the Controlling Access with Amazon Kinesis Data Firehose article again.再次阅读使用 Amazon Kinesis Data Firehose 控制访问一文后,我能够找出我们的错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM