简体   繁体   English

使用AWS Glue Job触发器来启动具有不同参数的作业

[英]using AWS Glue Job triggers to start jobs with different parameters

I am using AWS Glue ETL scripts and triggers to run a number of jobs on data in s3. 我正在使用AWS Glue ETL脚本和触发器来对s3中的数据运行许多作业。 I have written a total of four jobs that will take specific parameters based on the data we want to run the jobs against. 我总共编写了四个作业,这些作业将根据我们要针对其运行作业的数据采用特定的参数。 We want to be able to share the script for each of the jobs and pass in different parameters for the specific job that we want to run (ie job-A will have two different sets of parameters for different jobs-- data1 and data2. We set up a trigger to start job-B with data1 after job-A with data1 succeeds and a separate trigger to start job-B with data2 after job-A with data2 succeeds). 我们希望能够为每个作业共享脚本,并为要运行的特定作业传递不同的参数(即,作业A将为不同的作业提供两组不同的参数-data1和data2。设置一个触发器,以在数据a的作业A成功后启动以数据1的作业B;另外设置一个触发器,以在数据2的作业A成功后启动以数据2的作业B)。

Looking into job triggers, however, we can create triggers that start a job on the previous job's success (ie when job-A with parameters data1 passed in succeeds, trigger job-B with parameters data1; when job-A with parameters data2 passed in succeeds, trigger job-B with parameters data2), but because we are sharing code, regardless of the parameters (data1 v. data2) passed in, if the jobA with parameters for data1 succeeds, because job-B is configured to run on success of job-A, two instances of job-B will be kicked off-- one with parameters data1 and one with parameters data2. 但是,查看作业触发器,我们可以创建在上一个作业成功时启动作业的触发器(即,当传入参数data1的job-A成功时,触发参数data1的job-B;当传入参数data2的job-A时触发成功,则使用参数data2)触发job-B,但是由于我们共享代码,无论传入的参数(data1 v。data2)如何,如果具有data1参数的jobA成功,因为job-B配置为成功运行在作业A中,将启动作业B的两个实例-一个实例的参数为data1,另一个实例的参数为data2。

Ideally, we would like the triggers to only start the job with the matching set of parameters so we can share the glue ETL job scripts and only pass in parameters to triggers. 理想情况下,我们希望触发器仅使用匹配的参数集启动作业,以便我们可以共享粘合ETL作业脚本,并且仅将参数传递给触发器。

Is there a way we can achieve such a thing without creating different versions of the scripts? 有没有一种方法可以在不创建不同版本的脚本的情况下实现这一目标?

I'm having a similar issue, and I'm afraid that at the moment the only solution is to manually trigger the job at the end of the other job (eg using boto3) 我遇到类似的问题,目前恐怕唯一的解决方案是在其他作业结束时手动触发该作业(例如,使用boto3)

import boto3 

[...]

client = boto3.client('glue', 'us-east-1')
if data1:
    client.start_job_run(
            JobName='job-A',
            Arguments={
                '--data': data1)
elif data2:
    client.start_job_run(
            JobName='job-B',
            Arguments={
                '--data': data2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM