[英]Creating a Glue job with AWS CDK (python) fails
I'm using Python wrappers for CDK to create a Glue job.我正在使用 CDK 的 Python 包装器来创建 Glue 作业。 The
command
attribute requires an object of type IResolvable | JobCommandProperty
command
属性需要一个IResolvable | JobCommandProperty
类型的对象IResolvable | JobCommandProperty
IResolvable | JobCommandProperty
. IResolvable | JobCommandProperty
。 I tried to put a JobCommandProperty
object here but I'm getting an exception.我试图在此处放置一个
JobCommandProperty
对象,但出现异常。
I created a JobCommandProperty
object.我创建了一个
JobCommandProperty
对象。 I was looking for a .builder()
function somewhere (similar than in the Java API), but couldn't find one.我在某处寻找
.builder()
函数(类似于 Java API),但找不到。
from aws_cdk import (
aws_glue as glue,
aws_iam as iam,
core
)
class ScheduledGlueJob (core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
policy_statement = iam.PolicyStatement(
actions=['logs:*','s3:*','ec2:*','iam:*','cloudwatch:*','dynamodb:*','glue:*']
)
policy_statement.add_all_resources()
glue_job_role = iam.Role(
self,
'Glue-Job-Role',
assumed_by=iam.ServicePrincipal('glue.amazonaws.com')
).add_to_policy(
policy_statement
)
job = glue.CfnJob(
self,
'glue-test-job',
role=glue_job_role,
allocated_capacity=10,
command=glue.CfnJob.JobCommandProperty(
name='glueetl',
script_location='s3://my-bucket/glue-scripts/job.scala'
))
The error message is this:错误信息是这样的:
$cdk synth
Traceback (most recent call last):
File "app.py", line 30, in <module>
glue_job = ScheduledGlueJob(app, 'Cronned-Glue-Job')
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_runtime.py", line 66, in __call__
inst = super().__call__(*args, **kwargs)
File "/Users/d439087/IdeaProjects/ds/test_cdk/glue/scheduled_job.py", line 33, in __init__
script_location='s3://my-bucket/glue-scripts/job.scala'
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_runtime.py", line 66, in __call__
inst = super().__call__(*args, **kwargs)
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/aws_cdk/aws_glue/__init__.py", line 2040, in __init__
jsii.create(CfnJob, self, [scope, id, props])
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/__init__.py", line 208, in create
overrides=overrides,
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 331, in create
return self._process.send(request, CreateResponse)
File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 316, in send
raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: Expected 'string', got true (boolean)
Maybe someone has a working CDK (python) example to create a CfnJob
object?也许有人有一个工作 CDK (python) 示例来创建
CfnJob
对象?
没关系, role
属性必须是string
类型,我对 JSII 错误消息感到困惑。
glue_job_role variable's type is no longer Role because you have added .add_to_policy to it. glue_job_role变量的类型不再是 Role,因为您已将 .add_to_policy 添加到它。 below code should work.
下面的代码应该可以工作。
glue_job_role = iam.Role(
self,
'Glue-Job-Role',
assumed_by=iam.ServicePrincipal('glue.amazonaws.com')
)
glue_job_role.add_to_policy(
policy_statement
)
job = glue.CfnJob(
self,
'glue-test-job',
role=glue_job_role.arn,
allocated_capacity=10,
command=glue.CfnJob.JobCommandProperty(
name='glueetl',
script_location='s3://my-bucket/glue-scripts/job.scala'
))
Be aware that a crawler
is not the same as a job
, nonetheless I think the permissions are similar.请注意,
crawler
与job
,但我认为权限相似。 As of 16 August 2020, this is working for a crawler (and none of the previous answers unfortunately)截至 2020 年 8 月 16 日,这适用于爬虫(不幸的是,以前的答案都没有)
from aws_cdk import (
aws_iam as iam,
aws_glue as glue,
core
)
class MyDataScienceStack(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
statement = iam.PolicyStatement(actions=["s3:GetObject","s3:PutObject"],
resources=["arn:aws:s3:::mybucketname",
"arn:aws:s3:::mybucketname/data_warehouse/units/*"])
write_to_s3_policy = iam.PolicyDocument(statements=[statement])
glue_role = iam.Role(
self, 'GlueCrawlerFormyDataScienceRole',
role_name = 'GlueCrawlerFormyDataScienceRole',
inline_policies=[write_to_s3_policy],
assumed_by=iam.ServicePrincipal('glue.amazonaws.com'),
managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name('service-role/AWSGlueServiceRole')]
)
glue_crawler = glue.CfnCrawler(
self, 'glue-crawler-id',
description="Glue Crawler for my-data-science-s3",
name='any name',
database_name='units',
schedule={"scheduleExpression": "cron(5 * * * ? *)"},
role=glue_role.role_arn,
targets={"s3Targets": [{"path": "s3://mybucketname/data_warehouse/units"}]}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.