简体   繁体   中英

how to pass parameter to glue job using boto3 create_job programmatically

I am creating glue job using boto3 create job script and trying to pass default argument value to path location to run different s3 bucket files.

Below script is sample code, which will create glue ETL job. How to pass parameters to sourcepath using args?

Sample script:

import boto3
import json
client = boto3.client('glue')
response = client.create_job(
   Name='jobname',
   Description='Glue Job',
   LogUri='s3://bucket/logs/',
   Role='arn:aws:iam::',
   ExecutionProperty={
       'MaxConcurrentRuns': 3
   },
   Command={
       'Name': 'glue',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
   },
   MaxRetries = 1,
   Timeout=123,
   GlueVersion='3.0',
   NumberOfWorkers=2,
   WorkerType='G.1X',
   DefaultArguments = {'s3sourcepath':'s3://bucketname/csvfile.csv'}
   CodeGenConfigurationNodes = {
   'node-1':{
       'S3CsvSource': {
               'Name': 's3_source',
               'Paths': [
                   args['s3sourcepath'], ------ here how to pass default arguments 
               ],
               'Separator': 'comma',
               'QuoteChar': 'quote',
               'WithHeader': True,
               'WriteHeader': True
           }
)

Thanks in advance.

You first need to retrieve the arguments that you have passed using getResolvedOptions. Something like this:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['s3sourcepath'])

Now you should be able to use args['s3sourcepath']

You can read this for more info.

Am not sure if this run-time parameters can be set while creating a Glue job. Can you try to set run-time parameters when you call start_job_run(). You can refer here here for code samples

response = client.start_job_run(
           JobName = 'my_test_Job',
           Arguments = {
             '--s3sourcepath':   's3 path',
              } )

In your code, the job command is given as glue .

Command={
   'Name': 'glue',
   'ScriptLocation': 's3://bucketname/gluejob.py',
   'PythonVersion': '3'

},

But the documentation here says it should be glueetl

Command={
       'Name': 'glueetl',
       'ScriptLocation': 's3://bucketname/gluejob.py',
       'PythonVersion': '3'
   },

Can you try with

'Name': 'glueetl'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM