繁体   English   中英

从 Postgres RDS 到 Redshift 的 AWS DMS 复制任务在 S3 存储桶上获得 AccessDenied

[英]AWS DMS replication task from Postgres RDS to Redshift getting AccessDenied on S3 bucket

我们已经部署了一个 DMS 复制任务来将我们的整个 Postgres 数据库复制到 Redshift。 这些表是使用正确的模式创建的,但数据没有进入 Redshift 并被保存在 DMS 用作中间步骤的 S3 存储桶中。 这都是通过 Terraform 部署的。

我们已按照复制实例 Terraform 文档中所述配置 IAM 角色,并创建了所有三个 dms dms-access-for-endpointdms-cloudwatch-logs-roledms-vpc-role IAM 角色。 IAM 角色通过不同的堆栈部署到 DMS 的部署位置,因为角色由另一个成功部署的 DMS 实例使用,该实例运行不同的任务。

data "aws_iam_policy_document" "dms_assume_role_document" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      identifiers = [
        "s3.amazonaws.com",
        "iam.amazonaws.com",
        "redshift.amazonaws.com",
        "dms.amazonaws.com",
        "redshift-serverless.amazonaws.com"
      ]
      type        = "Service"
    }
  }
}

# Database Migration Service requires the below IAM Roles to be created before
# replication instances can be created. See the DMS Documentation for
# additional information: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Security.html#CHAP_Security.APIRole
#  * dms-vpc-role
#  * dms-cloudwatch-logs-role
#  * dms-access-for-endpoint
resource "aws_iam_role" "dms_access_for_endpoint" {
  name                  = "dms-access-for-endpoint"
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSRedshiftS3Role"]
  force_detach_policies = true
}

resource "aws_iam_role" "dms_cloudwatch_logs_role" {
  name                  = "dms-cloudwatch-logs-role"
  description           = "Allow DMS to manage CloudWatch logs."
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSCloudWatchLogsRole"]
  force_detach_policies = true
}

resource "aws_iam_role" "dms_vpc_role" {
  name                  = "dms-vpc-role"
  description           = "DMS IAM role for VPC permissions"
  assume_role_policy    = data.aws_iam_policy_document.dms_assume_role_document.json
  managed_policy_arns   = ["arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole"]
  force_detach_policies = true
}

但是,在运行时,我们会在 CloudWatch 中看到以下日志:

2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Not retriable error: <AccessDenied> Access Denied [1001705]  (anw_retry_strategy.cpp:118)
2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Failed to list bucket 'dms-sandbox-redshift-intermediate-storage': error code <AccessDenied>: Access Denied [1001713]  (s3_dir_actions.cpp:105)
2022-09-01T16:51:38 [SOURCE_UNLOAD   ]E:  Failed to list bucket 'dms-sandbox-redshift-intermediate-storage' [1001713]  (s3_dir_actions.cpp:209)

我们还在存储桶本身上启用了 S3 服务器访问日志,以查看这是否会为我们提供更多信息。 这是我们所看到的(匿名):

<id> dms-sandbox-redshift-intermediate-storage [01/Sep/2022:15:43:32 +0000] 10.128.69.80 arn:aws:sts::<account>:assumed-role/dms-access-for-endpoint/dms-session-for-replication-engine <code> REST.GET.BUCKET - "GET /dms-sandbox-redshift-intermediate-storage?delimiter=%2F&max-keys=1000 HTTP/1.1" 403 AccessDenied 243 - 30 - "-" "aws-sdk-cpp/1.8.80/S3/Linux/4.14.276-211.499.amzn2.x86_64 x86_64 GCC/4.9.3" - <code> SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3.eu-west-2.amazonaws.com TLSv1.2 -

以上表明服务dms-session-for-replication是接收 AccessDenied 响应的相关服务,但我们无法确定这是什么以及如何修复它。

我们尝试将存储桶策略添加到 S3 存储桶本身,但这不起作用(这也包括 S3 服务器访问日志存储桶):

resource "aws_s3_bucket" "dms_redshift_intermediate" {
  # Prefixed with `dms-` as that's what the AmazonDMSRedshiftS3Role policy filters on
  bucket = "dms-sandbox-redshift-intermediate-storage"
}

resource "aws_s3_bucket_logging" "log_bucket" {
  bucket        = aws_s3_bucket.dms_redshift_intermediate.id
  target_bucket = aws_s3_bucket.log_bucket.id
  target_prefix = "log/"
}

resource "aws_s3_bucket" "log_bucket" {
  bucket = "${aws_s3_bucket.dms_redshift_intermediate.id}-logs"
}

resource "aws_s3_bucket_acl" "log_bucket" {
  bucket = aws_s3_bucket.log_bucket.id
  acl    = "log-delivery-write"
}

resource "aws_s3_bucket_policy" "dms_redshift_intermediate_policy" {
  bucket = aws_s3_bucket.dms_redshift_intermediate.id
  policy = data.aws_iam_policy_document.dms_redshift_intermediate_policy_document.json
}

data "aws_iam_policy_document" "dms_redshift_intermediate_policy_document" {
  statement {
    actions = [
      "s3:*"
    ]

    principals {
      identifiers = [
        "dms.amazonaws.com",
        "redshift.amazonaws.com"
      ]
      type = "Service"
    }

    resources = [
      aws_s3_bucket.dms_redshift_intermediate.arn,
      "${aws_s3_bucket.dms_redshift_intermediate.arn}/*"
    ]
  }
}

我们如何解决我们在 CloudWatch 上看到的<AccessDenied>问题并启用将数据加载到 Redshift? PUT能够将项目放入 S3 存储桶中,因为我们看到加密的 CSV 出现在其中(服务器访问日志也证实了这一点),但 DMS 无法为 Redshift GET文件。 AccessDenied 响应还表明这是 IAM 角色问题而不是安全组问题,但我们的 IAM 角色是根据文档配置的,因此我们对可能导致此问题的原因感到困惑。

没错,这是一个 IAM 角色问题,请确保问题中的角色已将以下语句添加到策略文档中,

{
  "Effect": "Allow",
    "Action": [
      "s3:ListBucket"
    ],
      "Resource":"arn:aws:s3:::<yourbucketnamehere>"
},
  {
    "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:GetBucketLocation"
      ],
        "Resource": "arn:aws:s3:::*"
  }

我们认为是 IAM 问题,实际上是安全组问题。 Redshift 的COPY命令难以访问 S3。 通过将 HTTPS 的 443 出口规则添加到 Redshift 安全组,我们能够再次提取数据

resource "aws_security_group_rule" "https_443_egress" {
  type              = "egress"
  description       = "Allow HTTP egress from DMS SG"
  protocol          = "tcp"
  to_port           = 443
  from_port         = 443
  security_group_id = aws_security_group.redshift.id
  cidr_blocks       = ["0.0.0.0/0"]
}

因此,如果您遇到与问题相同的问题,请检查 Redshift 是否可以通过 HTTPS 访问 S3。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM