简体   繁体   English

由于 IAM 权限,无法运行 AWS Glue Crawler

[英]Unable to run AWS Glue Crawler due to IAM Permissions

I am unable to run newly created AWS Glue Crawler.我无法运行新创建的 AWS Glue Crawler。 I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console我在https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console遵循了 IAM 角色指南

  1. Created new Crawler Role AWSGlueServiceRoleDefault with AWSGlueServiceRole and AmazonS3FullAccess managed policies使用AWSGlueServiceRoleAmazonS3FullAccess托管策略创建了新的爬虫角色AWSGlueServiceRoleDefault
  2. Trust Relationship contains:信任关系包含:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  1. User executing crawler signs via SSO and inheriths arn:aws:iam::aws:policy/AdministratorAccess用户通过 SSO 执行爬虫签名并继承arn:aws:iam::aws:policy/AdministratorAccess
  2. I even tried to create new AWS user with all permissions我什至尝试创建具有所有权限的新 AWS 用户权限

After executing Crawler it fails within 8 seconds with following error:执行 Crawler 后,它会在 8 秒内失败并出现以下错误:

Crawler cannot be started.爬虫无法启动。 Verify the permissions in the policies attached to the IAM role defined in the crawler验证附加到爬虫中定义的 IAM 角色的策略中的权限

What other IAM permissions are needed?还需要哪些其他 IAM 权限?

If you're crawling tables and schemas via a JDBC connection to an external data store, make sure you have specified.network options to the Glue Connection.如果您通过 JDBC 连接到外部数据存储来爬网表和模式,请确保您已为 Glue 连接指定了网络选项。 I got the exactly same error if the options is not specified.如果未指定选项,我会得到完全相同的错误。 I think the error message is somewhat misleading here.我认为这里的错误信息有些误导。

Here's what I have defined to my crawlers:这是我为爬虫定义的内容:

  1. A role, eg AWSGlueServiceRoleDefault with AWSGlueServiceRole managed policies attached.一个角色,例如 AWSGlueServiceRoleDefault 附加了 AWSGlueServiceRole 托管策略。 在此处输入图像描述

  2. Specify the.network options to your connections.为您的连接指定.network 选项。 在此处输入图像描述

  3. A NAT gateway is created and attached to the su.net you have defined in the step 2 so that there is a public IP available for your crawler to connect to the external data store.创建一个 NAT 网关并将其附加到您在步骤 2 中定义的 su.net,以便您的爬虫程序可以使用公共 IP 连接到外部数据存储。 在此处输入图像描述

If you're attempting to connecting RDS, since the crawler and the database are both in the AWS.network, a NAT is not needed.如果您尝试连接 RDS,由于爬虫和数据库都在 AWS.network 中,因此不需要 NAT。 Just define the security group rules to allow the connections.只需定义安全组规则以允许连接。 Check the document here .此处查看文档。

If S3 is the target data source, a VPC endpoint for S3 is recommended.如果 S3 是目标数据源,则建议使用 S3 的 VPC 端点。 Check the document here .此处查看文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM