简体   繁体   English

从 Postgres RDS 到 Redshift 的 AWS DMS 复制任务在 S3 存储桶上获得 AccessDenied

[英]AWS DMS replication task from Postgres RDS to Redshift getting AccessDenied on S3 bucket

We have deployed a DMS replication task to replicate our entire Postgres database to Redshift.我们已经部署了一个 DMS 复制任务来将我们的整个 Postgres 数据库复制到 Redshift。 The tables are getting created with the correct schemas, but the data isn't coming through to Redshift and getting held up in the S3 bucket DMS uses as an intermediary step.这些表是使用正确的模式创建的,但数据没有进入 Redshift 并被保存在 DMS 用作中间步骤的 S3 存储桶中。 This is all deployed via Terraform.这都是通过 Terraform 部署的。

We've configured the IAM roles as described in the replication instance Terraform docs with all three of dms-access-for-endpoint , dms-cloudwatch-logs-role , and dms-vpc-role IAM roles created.我们已按照复制实例 Terraform 文档中所述配置 IAM 角色,并创建了所有三个 dms dms-access-for-endpointdms-cloudwatch-logs-roledms-vpc-role IAM 角色。 The IAM roles are deployed via a different stack to where DMS is deployed from as the roles are used by another, successfully deployed, DMS instance running a different task. IAM 角色通过不同的堆栈部署到 DMS 的部署位置,因为角色由另一个成功部署的 DMS 实例使用,该实例运行不同的任务。

data "aws_iam_policy_document" "dms_assume_role_document" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      identifiers = [
        "s3.amazonaws.com",
        "iam.amazonaws.com",
        "redshift.amazonaws.com",
        "dms.amazonaws.com",
        "redshift-serverless.amazonaws.com"
      ]
      type        = "Service"
    }
  }
}

# Database Migration Service requires the below IAM Roles to be created before
# replication instances can be created. See the DMS Documentation for
# additional information: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.html#CHAP_Security.APIRole
#  * dms-vpc-role
#  * dms-cloudwatch-logs-role
#  * dms-access-for-endpoint
resource "aws_iam_role" "dms_access_for_endpoint" {
  name                  = "dms-access-for-endpoint"
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSRedshiftS3Role"]
  force_detach_policies = true
}

resource "aws_iam_role" "dms_cloudwatch_logs_role" {
  name                  = "dms-cloudwatch-logs-role"
  description           = "Allow DMS to manage CloudWatch logs."
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSCloudWatchLogsRole"]
  force_detach_policies = true
}

resource "aws_iam_role" "dms_vpc_role" {
  name                  = "dms-vpc-role"
  description           = "DMS IAM role for VPC permissions"
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole"]
  force_detach_policies = true
}

However, on runtime, we're seeing the following logs in CloudWatch:但是,在运行时,我们会在 CloudWatch 中看到以下日志:

2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Not retriable error: <AccessDenied> Access Denied [1001705]  (anw_retry_strategy.cpp:118)
2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Failed to list bucket 'dms-sandbox-redshift-intermediate-storage': error code <AccessDenied>: Access Denied [1001713]  (s3_dir_actions.cpp:105)
2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Failed to list bucket 'dms-sandbox-redshift-intermediate-storage' [1001713]  (s3_dir_actions.cpp:209)

We also enabled S3 server access logs on the bucket itself to see whether this would give us more information.我们还在存储桶本身上启用了 S3 服务器访问日志,以查看这是否会为我们提供更多信息。 This is what we're seeing (anonymised):这是我们所看到的(匿名):

<id> dms-sandbox-redshift-intermediate-storage [01/Sep/2022:15:43:32 +0000] 10.128.69.80 arn:aws:sts::<account>:assumed-role/dms-access-for-endpoint/dms-session-for-replication-engine <code> REST.GET.BUCKET - "GET /dms-sandbox-redshift-intermediate-storage?delimiter=%2F&max-keys=1000 HTTP/1.1" 403 AccessDenied 243 - 30 - "-" "aws-sdk-cpp/1.8.80/S3/Linux/4.14.276-211.499.amzn2.x86_64 x86_64 GCC/4.9.3" - <code> SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 -

The above suggests that a service dms-session-for-replication is the service in question that is receiving the AccessDenied responses, but we're unable to pinpoint what this is and how we can fix it.以上表明服务dms-session-for-replication是接收 AccessDenied 响应的相关服务,但我们无法确定这是什么以及如何修复它。

We attempted to add a bucket policy to the S3 bucket itself but this did not work (this also includes the S3 server access logs bucket):我们尝试将存储桶策略添加到 S3 存储桶本身,但这不起作用(这也包括 S3 服务器访问日志存储桶):

resource "aws_s3_bucket" "dms_redshift_intermediate" {
  # Prefixed with `dms-` as that's what the AmazonDMSRedshiftS3Role policy filters on
  bucket = "dms-sandbox-redshift-intermediate-storage"
}

resource "aws_s3_bucket_logging" "log_bucket" {
  bucket        = aws_s3_bucket.dms_redshift_intermediate.id
  target_bucket = aws_s3_bucket.log_bucket.id
  target_prefix = "log/"
}

resource "aws_s3_bucket" "log_bucket" {
  bucket = "${aws_s3_bucket.dms_redshift_intermediate.id}-logs"
}

resource "aws_s3_bucket_acl" "log_bucket" {
  bucket = aws_s3_bucket.log_bucket.id
  acl    = "log-delivery-write"
}

resource "aws_s3_bucket_policy" "dms_redshift_intermediate_policy" {
  bucket = aws_s3_bucket.dms_redshift_intermediate.id
  policy = data.aws_iam_policy_document.dms_redshift_intermediate_policy_document.json
}

data "aws_iam_policy_document" "dms_redshift_intermediate_policy_document" {
  statement {
    actions = [
      "s3:*"
    ]

    principals {
      identifiers = [
        "dms.amazonaws.com",
        "redshift.amazonaws.com"
      ]
      type = "Service"
    }

    resources = [
      aws_s3_bucket.dms_redshift_intermediate.arn,
      "${aws_s3_bucket.dms_redshift_intermediate.arn}/*"
    ]
  }
}

How do we fix the <AccessDenied> issues that we're seeing on CloudWatch and enable data loading to Redshift?我们如何解决我们在 CloudWatch 上看到的<AccessDenied>问题并启用将数据加载到 Redshift? DMS is able to PUT items in the S3 bucket as we're seeing encrypted CSVs appearing in there (the server access logs also confirm this), but DMS is unable to then GET the files back out of it for Redshift. PUT能够将项目放入 S3 存储桶中,因为我们看到加密的 CSV 出现在其中(服务器访问日志也证实了这一点),但 DMS 无法为 Redshift GET文件。 The AccessDenied responses also suggest that it's an IAM role issue not a security group issue but our IAM roles are configured as per the docs so we're confused as to what could be causing this issue. AccessDenied 响应还表明这是 IAM 角色问题而不是安全组问题,但我们的 IAM 角色是根据文档配置的,因此我们对可能导致此问题的原因感到困惑。

You are right this is an IAM role issue, make sure the role in questions has the following statements added to the policy document,没错,这是一个 IAM 角色问题,请确保问题中的角色已将以下语句添加到策略文档中,

{
  "Effect": "Allow",
    "Action": [
      "s3:ListBucket"
    ],
      "Resource":"arn:aws:s3:::<yourbucketnamehere>"
},
  {
    "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:GetBucketLocation"
      ],
        "Resource": "arn:aws:s3:::*"
  }

What we thought was an IAM issue, was actually a security group issue.我们认为是 IAM 问题,实际上是安全组问题。 The COPY command for Redshift was struggling to access S3. Redshift 的COPY命令难以访问 S3。 By adding a 443 egress rule for HTTPS to the Redshift security group, we were able to pull data through again通过将 HTTPS 的 443 出口规则添加到 Redshift 安全组,我们能够再次提取数据

resource "aws_security_group_rule" "https_443_egress" {
  type              = "egress"
  description       = "Allow HTTP egress from DMS SG"
  protocol          = "tcp"
  to_port           = 443
  from_port         = 443
  security_group_id = aws_security_group.redshift.id
  cidr_blocks       = ["0.0.0.0/0"]
}

So if you're experiencing the same issue as the question, check whether Redshift has access to S3 via HTTPS.因此,如果您遇到与问题相同的问题,请检查 Redshift 是否可以通过 HTTPS 访问 S3。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM