简体   繁体   English

部署跨账户 Sagemaker 端点时出错

[英]Error when deploying cross account Sagemaker Endpoints

I am using cdk to deploy a Sagemaker Endpoint in a cross-account context.我正在使用 cdk 在跨账户上下文中部署 Sagemaker 端点。

The following error appears when creating the Sagemaker Endpoint: Failed to download model data for container "container_1" from URL: "s3://.../model.tar.gz".创建 Sagemaker Endpoint 时出现以下错误:无法从 URL 下载容器“container_1”的 model 数据:“s3://.../model.tar.gz”。 Please ensure that there is an object located at the URL and that the role passed to CreateModel has permissions to download the object.请确保 object 位于 URL 并且传递给 CreateModel 的角色有权下载 object。

Here are some useful details.这里有一些有用的细节。

I have two accounts:我有两个帐户:

  • Account A: includes the encrypted s3 bucket in which the model artifact has been saved, the Sagemaker model package group with the latest approved version and a CodePipeline that deploys the endpoint in the account A itself and account B.账户 A:包括已保存 model 工件的加密 s3 存储桶、Sagemaker model package 组,该账户具有最新的端点批准版本和 B 账户本身部署的 CodePipeline。
  • Account B: includes the endpoint deployed by CodePipeline in Account A.账户 B:包括 CodePipeline 在账户 A 中部署的端点。

In AccountA:在帐户 A 中:

  • The cross account permissions are set both for the bucket and the kms key used to encrypt that bucket为存储桶和用于加密该存储桶的 kms 密钥设置了跨账户权限
// Create bucket and kms key to be used by Sagemaker Pipeline

        //KMS
        const sagemakerKmsKey = new Key(
            this,
            "SagemakerBucketKMSKey",
            {
                description: "key used for encryption of data in Amazon S3",
                enableKeyRotation: true,
                policy: new PolicyDocument(
                    {
                        statements:[
                            new PolicyStatement(
                                {
                                    actions:["kms:*"],
                                    effect: Effect.ALLOW,
                                    resources:["*"],
                                    principals: [new AccountRootPrincipal()]
                                }
                            ),
                            new PolicyStatement(
                                {
                                    actions:[
                                        "kms:*"
                                    ],
                                    effect: Effect.ALLOW,
                                    resources:["*"],
                                    principals: [
                                        new ArnPrincipal(`arn:${Aws.PARTITION}:iam::${AccountA}:root`),
                                        new ArnPrincipal(`arn:${Aws.PARTITION}:iam::${AccountB}:root`),
                                    ]
                                }
                            )
                        ]
                    }
                )
            }
        )

        // S3 Bucket
        const sagemakerArtifactBucket = new Bucket(
            this,
            "SagemakerArtifactBucket",
            {
                bucketName:`mlops-${projectName}-${Aws.REGION}`,
                encryptionKey:sagemakerKmsKey,
                versioned:false,
                removalPolicy: RemovalPolicy.DESTROY
            }
        )
        
        sagemakerArtifactBucket.addToResourcePolicy(
            new PolicyStatement(
                {
                    actions: [
                        "s3:*",
                    ],
                    resources: [
                        sagemakerArtifactBucket.bucketArn,
                        `${sagemakerArtifactBucket.bucketArn}/*`
                    ],
                    principals: [
                        new ArnPrincipal(`arn:${Aws.PARTITION}:iam::${AccountA}:root`),
                        new ArnPrincipal(`arn:${Aws.PARTITION}:iam::${AccountB}:root`),
                    ]
                }
            )
        )
  • A CodeDeploy Action is used to deploy the Sagemaker Endpoint in AccountA and AccountB. CodeDeploy 操作用于在 AccountA 和 AccountB 中部署 Sagemaker 端点。
// Define Code Build Deploy Staging Action
        const deployStagingAction = new CloudFormationCreateUpdateStackAction(
            {
                actionName: "DeployStagingAction",
                runOrder: 1,
                adminPermissions: false,
                stackName: `${projectName}EndpointStaging`,
                templatePath: cdKSynthArtifact.atPath("staging.template.json"),
                replaceOnFailure: true,
                role: Role.fromRoleArn(
                    this,
                    "StagingActionRole",
                    `arn:${Aws.PARTITION}:iam::${AccountB}:role/cdk-hnb659fds-deploy-role-${AccountB}-${Aws.REGION}`,
                ),
                deploymentRole: Role.fromRoleArn(
                    this,
                    "StagingDeploymentRole",
                    `arn:${Aws.PARTITION}:iam::${AccountB}:role/cdk-hnb659fds-cfn-exec-role-${AccountB}-${Aws.REGION}`
                ),
                cfnCapabilities: [
                    CfnCapabilities.AUTO_EXPAND,
                    CfnCapabilities.NAMED_IAM
                ]
            }
        )

Specifically, the role that creates the Sagemaker Model and Sagemaker Endpoints should be cdk-hnb659fds-cfn-exec-role, as seen on CloudTrail, but for testing purposes I've granted to both of them Administrator privileges (the error still appears).具体来说,创建 Sagemaker Model 和 Sagemaker 端点的角色应该是 cdk-hnb659fds-cfn-exec-role,如 CloudTrail 所示,但出于测试目的,我已授予他们两个管理员权限(错误仍然出现)。

The deployment in AccountA is correctly executed, thus it means that the bucket location is correct. AccountA中的部署是正确执行的,也就是说bucket位置是正确的。

NOTE: everything is deployed correctly up to the Sagemaker Endpoint.注意:一切都正确部署到 Sagemaker 端点。

I managed to find the issue.我设法找到了问题。

The problem was that, even though the bucket was created with a custom KMSKey, the artifacts stored into the bucket are generated by an Estimator .问题在于,即使存储桶是使用自定义 KMSKey 创建的,存储在存储桶中的工件也是由Estimator生成的。 If you do not specify the output_kms_key paramter, it will use a managed kms key, which is different from the one used for the s3 bucket.如果您不指定output_kms_key参数,它将使用托管 kms 密钥,该密钥与用于 s3 存储桶的密钥不同。

Even though the issue is not related to cross account permissions, I'll leave it here in case someone has a similar issue.即使该问题与跨帐户权限无关,我也会将其留在这里,以防有人遇到类似问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 交易模拟失败:错误处理指令0:使用未经授权的签名者或可写帐户进行跨程序调用 - Transaction simulation failed: Error processing Instruction 0: Cross-program invocation with unauthorized signer or writable account 在 typescript 中部署项目时,CodePipeline 中出现错误“无法上传工件” - Error 'Unable to upload artifact' in CodePipeline when deploying a project in typescript 将编译的打字稿部署到 lambda 时导入模块错误 - Import Module error when deploying compiled typescript to lambda 将 Typescript-React 应用程序部署到 Heroku 时出现加密错误 - Cryptic error when deploying Typescript-React App to Heroku 部署firebase函数时为什么会报错? - Why do I get an error when deploying firebase functions? 在 azure 应用服务上部署节点 js 应用时出现 Typescript 错误 - Typescript error when deploying node js app on azure app service 使用 replaceAll 将 Next.js 应用程序部署到 Vercel 时出错 function - Error when deploying Next.js app to vercel with replaceAll function AWS NodeJS 无服务器 - 部署项目时出现 Zip 错误 - AWS NodeJS Serverless - Zip Error When Deploying Project 在 NestJS 上使用“useGlobalGuards”时排除端点 - Exclude endpoints when using 'useGlobalGuards' on NestJS AWS-CDK 管道跨账户堆栈删除 - AWS-CDK Pipelines Cross Account Stack Removal
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM