I'm trying to create a glue etl job. I'm using boto3. I'm using the script below. I want to create it as type=Spark, but the script below creates a type=Python Shell. Also it doesn't disable bookmarks. Does anyone know what I need to add to make it a type Spark and disable bookmarks?
script:
response = glue_assumed_client.create_job(
Name='mlxxxx',
Role='Awsxxxx',
Command={
'Name': 'mlxxxx',
'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx',
'PythonVersion': '3'
},
Connections={
'Connections': [
'sxxxx',
'spxxxxxx',
]
},
Timeout=2880,
MaxCapacity=10
)
See the documentation .
Command (dict) -- [REQUIRED] The JobCommand that executes this job.
Name (string) -- The name of the job command. For an Apache Spark ETL job, this must be glueetl . For a Python shell job, it must be pythonshell.
You may reset the bookmark by using the function
client.reset_job_bookmark(
JobName='string',
RunId='string'
)
where the JobName
is required. It can be obtained from the response['Name']
of the command create_job()
To create Spark jobs, you would have to mention the name of the command as 'glueetl' as described below and if you are not running a python shell job you need not specify the python version in the Command parameters
response = client.create_job(
Name='mlxxxyu',
Role='Awsxxxx',
Command={
'Name': 'glueetl', # <—— mention the name as glueetl to create spark job
'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
},
Connections={
'Connections': [
'sxxxx',
'spxxxxxx',
]
},
Timeout=2880,
MaxCapacity=10
)
Regarding job bookmarks, job bookmarks are disabled by default, so if you don't specify a parameter for a job bookmarks then the job created would have bookmarks disabled.
If you want to explicitly disable bookmarks, then you can specify the same in the Default Arguments[1] as shown below.
response = client.create_job(
Name='mlxxxyu',
Role='Awsxxxx',
Command={
'Name': 'glueetl',
'ScriptLocation': ‘s3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
},
DefaultArguments={
'--job-bookmark-option': 'job-bookmark-disable'
},
Timeout=2880,
MaxCapacity=10
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.