由于 IAM 权限，无法运行 AWS Glue Crawler

Question

I am unable to run newly created AWS Glue Crawler.我无法运行新创建的 AWS Glue Crawler。 I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console我在https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console遵循了 IAM 角色指南

Created new Crawler Role AWSGlueServiceRoleDefault with AWSGlueServiceRole and AmazonS3FullAccess managed policies使用AWSGlueServiceRole和AmazonS3FullAccess托管策略创建了新的爬虫角色AWSGlueServiceRoleDefault
Trust Relationship contains:信任关系包含：

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

User executing crawler signs via SSO and inheriths arn:aws:iam::aws:policy/AdministratorAccess用户通过 SSO 执行爬虫签名并继承arn:aws:iam::aws:policy/AdministratorAccess
I even tried to create new AWS user with all permissions我什至尝试创建具有所有权限的新 AWS 用户

After executing Crawler it fails within 8 seconds with following error:执行 Crawler 后，它会在 8 秒内失败并出现以下错误：

Crawler cannot be started.爬虫无法启动。 Verify the permissions in the policies attached to the IAM role defined in the crawler验证附加到爬虫中定义的 IAM 角色的策略中的权限

What other IAM permissions are needed?还需要哪些其他 IAM 权限？

Answer 1

If you're crawling tables and schemas via a JDBC connection to an external data store, make sure you have specified.network options to the Glue Connection.如果您通过 JDBC 连接到外部数据存储来爬网表和模式，请确保您已为 Glue 连接指定了网络选项。 I got the exactly same error if the options is not specified.如果未指定选项，我会得到完全相同的错误。 I think the error message is somewhat misleading here.我认为这里的错误信息有些误导。

Here's what I have defined to my crawlers:这是我为爬虫定义的内容：

A role, eg AWSGlueServiceRoleDefault with AWSGlueServiceRole managed policies attached.一个角色，例如 AWSGlueServiceRoleDefault 附加了 AWSGlueServiceRole 托管策略。
Specify the.network options to your connections.为您的连接指定.network 选项。
A NAT gateway is created and attached to the su.net you have defined in the step 2 so that there is a public IP available for your crawler to connect to the external data store.创建一个 NAT 网关并将其附加到您在步骤 2 中定义的 su.net，以便您的爬虫程序可以使用公共 IP 连接到外部数据存储。

If you're attempting to connecting RDS, since the crawler and the database are both in the AWS.network, a NAT is not needed.如果您尝试连接 RDS，由于爬虫和数据库都在 AWS.network 中，因此不需要 NAT。 Just define the security group rules to allow the connections.只需定义安全组规则以允许连接。 Check the document here .在此处查看文档。

If S3 is the target data source, a VPC endpoint for S3 is recommended.如果 S3 是目标数据源，则建议使用 S3 的 VPC 端点。 Check the document here .在此处查看文档。

由于 IAM 权限，无法运行 AWS Glue Crawler

问题描述

1 个解决方案

解决方案1
0 2023-02-02 05:50:43

由于 IAM 权限，无法运行 AWS Glue Crawler

问题描述

1 个解决方案

解决方案1 0 2023-02-02 05:50:43

解决方案1
0 2023-02-02 05:50:43