简体   繁体   English

在step函数中调用时,如何使Python胶粘作业返回?

[英]How to have a Python glue job return when called in step function?

I have a glue job, in python, that I call from a step function. 我在python中有一个胶水作业,可以从step函数调用。 The step function successfully starts the job. 步进功能成功启动了作业。 The job successfully finishes. 作业成功完成。 But the step function never moves to the next step. 但是步进功能永远不会移到下一步。 Is there some required configuration/permission for the step function to respond to job success? 步进功能是否有一些必要的配置/权限才能响应作业成功? Something to do in the python script? 在python脚本中做什么?

Here is the step function (state machine) definition: 这是步骤功能(状态机)的定义:

"MyGlueTask": {
  "Type": "Task",
  "Resource": "arn:aws:states:::glue:startJobRun.sync",
  "Parameters": {
    "JobName": "my_glue_job"
  },
  "ResultPath": "$.MyGlueTask",
  "Next": "NextGlueJob"
}

Are you sure it never moves to the next step? 您确定它永远不会移到下一步吗? Maybe it does, but, for instance, in 5 minutes? 也许可以,但是例如在5分钟内?

I'm asking that because Step Functions has the limitation: even if your Glue job executes in a few seconds, Step Functions polls the results from Glue job once every 5 minutes actually. 我之所以这样问,是因为Step Functions具有局限性:即使您的Glue作业在几秒钟内执行,Step Functions也会实际上每5分钟轮询一次Glue作业的结果。

A kind of workaround you could implement is to change arn:aws:states:::glue:startJobRun.sync to arn:aws:states:::glue:startJobRun — then Glue job task just will trigger the Glue job and will move to the next step. 您可以实现的一种解决方法是将arn:aws:states:::glue:startJobRun.syncarn:aws:states:::glue:startJobRun —然后,Glue作业任务将触发Glue作业,并移至下一步。

Most likely, you will need to wait the Glue job finished and get some result out of there. 最有可能的是,您将需要等待Glue作业完成并从中获得一些结果。 Therefore, you need to wrap the previous state with a few more ones. 因此,您需要用其他一些状态来包装以前的状态。

  1. The main purpose is to merely start the Glue job. 主要目的只是开始胶水工作。 Apart from that, we need Glue job RunJobId . 除此之外,我们需要Glue作业RunJobId I don't know if it can be retrieved from Glue job itself, so I've created a Lambda to run the Glue job using boto3 start_job_run function and then get RunJobId from the response. 我不知道是否可以从Glue作业本身中检索它,因此我创建了一个Lambda以使用boto3 start_job_run函数运行Glue作业,然后从响应中获取RunJobId
  2. Create a Lambda which will be grabbing the status ( JobRunState ) of the Glue job (via boto3 get_job_run function) by RunJobId from the previous step. 创建一个lambda将被抓住的状态( JobRunState (通过boto3胶作业) get_job_run功能)通过RunJobId从前面的步骤。
  3. Using Wait Step Functions state type, run the Lambda you created every N seconds. 使用“ Wait步骤函数”状态类型,每N秒运行一次您创建的Lambda。
  4. Use Choice state type to filter Glue job statuses out. 使用“ Choice状态类型来筛选出“胶水”作业状态。
    • If RUNNING , go back to the Wait step. 如果是RUNNING ,则返回“ Wait步骤。
    • If SUCCEEDEED , then go ahead to the next state. 如果SUCCEEDEED ,然后继续前进到下一个状态。
    • If [FAILED | STOPPED] 如果[FAILED | STOPPED] [FAILED | STOPPED] , go wherever else. [FAILED | STOPPED] ,到其他地方去。

Finally, it looks something like this . 最后,它看起来像这样

The solution to my actual problem was permissions. 解决我实际问题的方法是权限。 You need four permissions when running a startJogRun.sync : 运行startJogRun.sync时需要四个权限:

  • glue:StartJobRun 胶水:StartJobRun
  • glue:GetJobRun 胶水:GetJobRun
  • glue:GetJobRuns 胶水:GetJobRuns
  • glue:BatchStopJobRun 胶水:BatchStopJobRun

Those are actually the Terraform values, but should help anybody struggling with this. 这些实际上是Terraform的值,但应该可以帮助任何为此苦苦挣扎的人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当作为参数上传到胶水 python shell 作业时如何传递新的 S3 文件 - how to pass a new S3 file when uploaded as a parameter to a glue python shell job 一步一步设置aws ETL胶水pyspark作业指南 - Guide to setup aws ETL glue pyspark job by step by step 如何在 Glue python 作业脚本中表示类型编号的排序键 - How to represent a sort key of type number in a Glue python job script 如何使用 Python ElementTree -Glue Job 提取文件 xml 属性 - How to extract file xml attribute using Python ElementTree -Glue Job 如何在使用 YAML(无服务器)部署具有胶水版本 1.0 的 AWS 胶水作业时选择 Python 版本 3 - How to choose python version 3 while deploying AWS glue Job with glue version 1.0 using YAML(serverless) 如何通过 Python 获取 Sagemaker Pipeline Step(Tuning Job 步骤)信息 - How to get Sagemaker Pipeline Step (Tuning Job step) information by Python 如何使用 AWS Glue python shell 作业从 python 脚本中获取 job_id? - How to get job_id from within the python script using AWS Glue python shell job? 在 aws 胶水作业中建立联系是否必不可少? - Is it essential to have a connection in a aws glue job? 如何在一个实例上调用list()时让一个类返回一个列表 - How to have a class return a list when list() is called on an instance of it AWS Glue python shell 作业是否支持 Glue 版本 2.0? - Does AWS Glue python shell job support Glue version 2.0?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM