简体   繁体   English

如何以编程方式使用boto3 create_job将参数传递给粘合作业

[英]how to pass parameter to glue job using boto3 create_job programmatically

I am creating glue job using boto3 create job script and trying to pass default argument value to path location to run different s3 bucket files.我正在使用 boto3 创建作业脚本创建粘合作业,并尝试将默认参数值传递给路径位置以运行不同的 s3 存储桶文件。

Below script is sample code, which will create glue ETL job.下面的脚本是示例代码,它将创建粘合 ETL 作业。 How to pass parameters to sourcepath using args?如何使用 args 将参数传递给 sourcepath?

Sample script:示例脚本:

import boto3
import json
client = boto3.client('glue')
response = client.create_job(
   Name='jobname',
   Description='Glue Job',
   LogUri='s3://bucket/logs/',
   Role='arn:aws:iam::',
   ExecutionProperty={
       'MaxConcurrentRuns': 3
   },
   Command={
       'Name': 'glue',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
   },
   MaxRetries = 1,
   Timeout=123,
   GlueVersion='3.0',
   NumberOfWorkers=2,
   WorkerType='G.1X',
   DefaultArguments = {'s3sourcepath':'s3://bucketname/csvfile.csv'}
   CodeGenConfigurationNodes = {
   'node-1':{
       'S3CsvSource': {
               'Name': 's3_source',
               'Paths': [
                   args['s3sourcepath'], ------ here how to pass default arguments 
               ],
               'Separator': 'comma',
               'QuoteChar': 'quote',
               'WithHeader': True,
               'WriteHeader': True
           }
)

Thanks in advance.提前致谢。

You first need to retrieve the arguments that you have passed using getResolvedOptions.您首先需要检索使用 getResolvedOptions 传递的 arguments。 Something like this:像这样的东西:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['s3sourcepath'])

Now you should be able to use args['s3sourcepath']现在您应该可以使用args['s3sourcepath']

You can read this for more info.您可以阅读内容以获取更多信息。

Am not sure if this run-time parameters can be set while creating a Glue job.不确定是否可以在创建 Glue 作业时设置此运行时参数。 Can you try to set run-time parameters when you call start_job_run().你可以在调用 start_job_run() 时尝试设置运行时参数吗? You can refer here here for code samples您可以在此处参考代码示例

response = client.start_job_run(
           JobName = 'my_test_Job',
           Arguments = {
             '--s3sourcepath':   's3 path',
              } )

In your code, the job command is given as glue .在您的代码中,作业命令以glue形式给出。

Command={
   'Name': 'glue',
   'ScriptLocation': 's3://bucketname/gluejob.py',
   'PythonVersion': '3'

}, },

But the documentation here says it should be glueetl但是这里的文档说它应该是glueetl

Command={
       'Name': 'glueetl',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
   },

Can you try with你能试试吗

'Name': 'glueetl'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM