简体   繁体   中英

Specify job type when creating glue job with boto3

I'm trying to create a glue etl job. I'm using boto3. I'm using the script below. I want to create it as type=Spark, but the script below creates a type=Python Shell. Also it doesn't disable bookmarks. Does anyone know what I need to add to make it a type Spark and disable bookmarks?

script:

response = glue_assumed_client.create_job(
    Name='mlxxxx',
    Role='Awsxxxx',
    Command={
        'Name': 'mlxxxx',
        'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx',
        'PythonVersion': '3'
    },

    Connections={
        'Connections': [
            'sxxxx',
'spxxxxxx',
        ]
    },

    Timeout=2880,
    MaxCapacity=10
)

See the documentation .

Command (dict) -- [REQUIRED] The JobCommand that executes this job.

Name (string) -- The name of the job command. For an Apache Spark ETL job, this must be glueetl . For a Python shell job, it must be pythonshell.

You may reset the bookmark by using the function

client.reset_job_bookmark(
    JobName='string',
    RunId='string'
)

where the JobName is required. It can be obtained from the response['Name'] of the command create_job()

To create Spark jobs, you would have to mention the name of the command as 'glueetl' as described below and if you are not running a python shell job you need not specify the python version in the Command parameters

response = client.create_job(
    Name='mlxxxyu',
    Role='Awsxxxx',
    Command={
        'Name': 'glueetl',     # <——   mention the name as glueetl to create spark job
        'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
    },
    Connections={
        'Connections': [
            'sxxxx',
'spxxxxxx',
        ]
    },

    Timeout=2880,
    MaxCapacity=10
)

Regarding job bookmarks, job bookmarks are disabled by default, so if you don't specify a parameter for a job bookmarks then the job created would have bookmarks disabled.

If you want to explicitly disable bookmarks, then you can specify the same in the Default Arguments[1] as shown below.

response = client.create_job(
    Name='mlxxxyu',
    Role='Awsxxxx',
    Command={
        'Name': 'glueetl',
        'ScriptLocation': ‘s3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
    },
    DefaultArguments={
        '--job-bookmark-option': 'job-bookmark-disable'
    },
    Timeout=2880,
    MaxCapacity=10
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM