[英]Unable to run AWS Glue Crawler due to IAM Permissions
I am unable to run newly created AWS Glue Crawler.我无法运行新创建的 AWS Glue Crawler。 I followed IAM Role guide at https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console
我在https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html?icmpid=docs_glue_console遵循了 IAM 角色指南
AWSGlueServiceRoleDefault
with AWSGlueServiceRole
and AmazonS3FullAccess
managed policiesAWSGlueServiceRole
和AmazonS3FullAccess
托管策略创建了新的爬虫角色AWSGlueServiceRoleDefault
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
arn:aws:iam::aws:policy/AdministratorAccess
arn:aws:iam::aws:policy/AdministratorAccess
After executing Crawler it fails within 8 seconds with following error:执行 Crawler 后,它会在 8 秒内失败并出现以下错误:
Crawler cannot be started.爬虫无法启动。 Verify the permissions in the policies attached to the IAM role defined in the crawler
验证附加到爬虫中定义的 IAM 角色的策略中的权限
What other IAM permissions are needed?还需要哪些其他 IAM 权限?
If you're crawling tables and schemas via a JDBC connection to an external data store, make sure you have specified.network options to the Glue Connection.如果您通过 JDBC 连接到外部数据存储来爬网表和模式,请确保您已为 Glue 连接指定了网络选项。 I got the exactly same error if the options is not specified.
如果未指定选项,我会得到完全相同的错误。 I think the error message is somewhat misleading here.
我认为这里的错误信息有些误导。
Here's what I have defined to my crawlers:这是我为爬虫定义的内容:
A role, eg AWSGlueServiceRoleDefault with AWSGlueServiceRole managed policies attached.一个角色,例如 AWSGlueServiceRoleDefault 附加了 AWSGlueServiceRole 托管策略。
Specify the.network options to your connections.为您的连接指定.network 选项。
A NAT gateway is created and attached to the su.net you have defined in the step 2 so that there is a public IP available for your crawler to connect to the external data store.创建一个 NAT 网关并将其附加到您在步骤 2 中定义的 su.net,以便您的爬虫程序可以使用公共 IP 连接到外部数据存储。
If you're attempting to connecting RDS, since the crawler and the database are both in the AWS.network, a NAT is not needed.如果您尝试连接 RDS,由于爬虫和数据库都在 AWS.network 中,因此不需要 NAT。 Just define the security group rules to allow the connections.
只需定义安全组规则以允许连接。 Check the document here .
在此处查看文档。
If S3 is the target data source, a VPC endpoint for S3 is recommended.如果 S3 是目标数据源,则建议使用 S3 的 VPC 端点。 Check the document here .
在此处查看文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.