使用 boto3 创建粘合作业时指定作业类型

Question

I'm trying to create a glue etl job.我正在尝试创建一个胶水等工作。 I'm using boto3.我正在使用boto3。 I'm using the script below.我正在使用下面的脚本。 I want to create it as type=Spark, but the script below creates a type=Python Shell.我想将它创建为 type=Spark，但下面的脚本创建了一个 type=Python Shell。 Also it doesn't disable bookmarks.它也不会禁用书签。 Does anyone know what I need to add to make it a type Spark and disable bookmarks?有谁知道我需要添加什么才能使其成为 Spark 类型并禁用书签？

script:脚本：

response = glue_assumed_client.create_job(
    Name='mlxxxx',
    Role='Awsxxxx',
    Command={
        'Name': 'mlxxxx',
        'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx',
        'PythonVersion': '3'
    },

    Connections={
        'Connections': [
            'sxxxx',
'spxxxxxx',
        ]
    },

    Timeout=2880,
    MaxCapacity=10
)

Answer 1

See the documentation .请参阅文档。

Command (dict) -- [REQUIRED] The JobCommand that executes this job. Command (dict) -- [REQUIRED] 执行此作业的 JobCommand。

Name (string) -- The name of the job command. Name (string) -- 作业命令的名称。 For an Apache Spark ETL job, this must be glueetl .对于 Apache Spark ETL 作业，这必须是geletl 。 For a Python shell job, it must be pythonshell.对于 Python shell 作业，它必须是 pythonshell。

You may reset the bookmark by using the function您可以使用 function 重置书签

client.reset_job_bookmark(
    JobName='string',
    RunId='string'
)

where the JobName is required.需要JobName的地方。 It can be obtained from the response['Name'] of the command create_job()可以从命令create_job()的response['Name']中获取

Answer 2

To create Spark jobs, you would have to mention the name of the command as 'glueetl' as described below and if you are not running a python shell job you need not specify the python version in the Command parameters要创建 Spark 作业，您必须将命令的名称称为“glueetl”，如下所述，如果您没有运行 python shell 作业，则无需在参数中指定 Z23EEEB4347BDD26BFC6B7EE9A3B75

response = client.create_job(
    Name='mlxxxyu',
    Role='Awsxxxx',
    Command={
        'Name': 'glueetl',     # <——   mention the name as glueetl to create spark job
        'ScriptLocation': 's3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
    },
    Connections={
        'Connections': [
            'sxxxx',
'spxxxxxx',
        ]
    },

    Timeout=2880,
    MaxCapacity=10
)

Regarding job bookmarks, job bookmarks are disabled by default, so if you don't specify a parameter for a job bookmarks then the job created would have bookmarks disabled.关于作业书签，作业书签在默认情况下是禁用的，因此如果您没有为作业书签指定参数，那么创建的作业将禁用书签。

If you want to explicitly disable bookmarks, then you can specify the same in the Default Arguments[1] as shown below.如果要显式禁用书签，则可以在 Default Arguments[1] 中指定相同的内容，如下所示。

response = client.create_job(
    Name='mlxxxyu',
    Role='Awsxxxx',
    Command={
        'Name': 'glueetl',
        'ScriptLocation': ‘s3://aws-glue-scripts-xxxxx-us-west-2/xxxx'
    },
    DefaultArguments={
        '--job-bookmark-option': 'job-bookmark-disable'
    },
    Timeout=2880,
    MaxCapacity=10
)

使用 boto3 创建粘合作业时指定作业类型

问题描述

2 个解决方案

解决方案1
0 2019-11-07 11:39:55

解决方案2
0 已采纳 2019-11-08 09:32:35

使用 boto3 创建粘合作业时指定作业类型

问题描述

2 个解决方案

解决方案1 0 2019-11-07 11:39:55

解决方案2 0 已采纳 2019-11-08 09:32:35

解决方案1
0 2019-11-07 11:39:55

解决方案2
0 已采纳 2019-11-08 09:32:35