[英]How to access run-property of AWS Glue workflow in Glue job?
I have been working with AWS Glue workflow for orchestrating batch jobs.我一直在使用 AWS Glue 工作流程来编排批处理作业。 we need to pass push-down-predicate in order to limit the processing for batch job.我们需要通过下推谓词来限制批处理作业的处理。 When we run Glue jobs alone, we can pass push down predicates as a command line argument at run time (ie aws glue start-job-run --job-name foo.scala --arguments --arg1-text ${arg1}..).当我们单独运行 Glue 作业时,我们可以在运行时将下推谓词作为命令行参数传递(即 aws glue start-job-run --job-name foo.scala --arguments --arg1-text ${arg1} ..)。 But when we use glue workflow to execute Glue jobs, it is bit unclear.但是当我们使用胶水工作流来执行胶水作业时,就有点不清楚了。
When we orchestrate Batch jobs using AWS Glue workflows, we can add run properties while creating workflow.当我们使用 AWS Glue 工作流程编排批处理作业时,我们可以在创建工作流程时添加运行属性。
I tried:我试过了:
aws glue start-workflow-run --name workflow-name | aws 胶水启动工作流运行--名称工作流名称 | jq -r '.RunId ' jq -r '.RunId '
aws glue put-workflow-run-properties --name workflow-name --run-id "ID" --run-properties --pushdownpredicate="some value" aws 胶水 put-workflow-run-properties --name 工作流名称 --run-id "ID" --run-properties --pushdownpredicate="some value"
I am able to see the run property I have passed using put-workflow-run-property我可以看到我使用 put-workflow-run-property 传递的运行属性
aws glue put-workflow-run-properties --name workflow-name --run-id "ID" aws 胶水放置工作流运行属性 --name 工作流名称 --run-id "ID"
But I am not able to detect "pushdownpredicate" in my Glue Job.但我无法在我的 Glue Job 中检测到“pushdownpredicate”。 Any idea how to access workflow's run property in Glue Job?知道如何在 Glue Job 中访问工作流的运行属性吗?
If you are using python as programming language for your Glue job then you can issue get_workflow_run_properties API call to retrieve the property and use it inside your Glue job.如果您使用 python 作为 Glue 作业的编程语言,那么您可以发出get_workflow_run_properties API 调用来检索属性并在您的 Glue 作业中使用它。
response = client.get_workflow_run_properties(
Name='string',
RunId='string'
)
This will give you below response which you can parse and use it:这将为您提供以下响应,您可以解析和使用它:
{
'RunProperties': {
'string': 'string'
}
}
If you are using scala then you can use equivalent AWS SDK.如果您使用的是 scala,那么您可以使用等效的 AWS 开发工具包。
In first instance you need to be sure that the job is running from a workflow:首先,您需要确保作业正在从工作流运行:
def get_worfklow_params(args: Dict[str, str]) -> Dict[str, str]:
"""
get_worfklow_params is delegated to retrieve the WORKFLOW parameters
"""
glue_client = boto3.client("glue")
if "WORKFLOW_NAME" in args and "WORKFLOW_RUN_ID" in args:
workflow_args = glue_client.get_workflow_run_properties(Name=args['WORKFLOW_NAME'], RunId=args['WORKFLOW_RUN_ID'])["RunProperties"]
print("Found the following workflow args: \n{}".format(workflow_args))
return workflow_args
print("Unable to find run properties for this workflow!")
return None
This method will return a map of the workflow
input parameter.此方法将返回workflow
输入参数的映射。
Than you can use the following method in order to retrieve a given parameter:您可以使用以下方法来检索给定参数:
def get_worfklow_param(args: Dict[str, str], arg) -> str:
"""
get_worfklow_param is delegated to verify if the given parameter is present in the job and return it. In case of no presence None will be returned
"""
if args is None:
return None
return args[arg] if arg in args else None
From reuse the code, in my opinion is better to create a python ( whl
) module and set the module in the script path of your job.从重用代码来看,我认为最好创建一个 python ( whl
) 模块并将该模块设置在您的作业的脚本路径中。 By this way, you can retrieve the method with a simple import.通过这种方式,您可以通过简单的导入来检索方法。
Without the whl
module, you can move in the following way:如果没有whl
模块,您可以按以下方式移动:
def MyTransform(glueContext, dfc) -> DynamicFrameCollection:
import boto3
import sys
from typing import Dict
def get_worfklow_params(args: Dict[str, str]) -> Dict[str, str]:
"""
get_worfklow_params is delegated to retrieve the WORKFLOW parameters
"""
glue_client = boto3.client("glue")
if "WORKFLOW_NAME" in args and "WORKFLOW_RUN_ID" in args:
workflow_args = glue_client.get_workflow_run_properties(
Name=args['WORKFLOW_NAME'], RunId=args['WORKFLOW_RUN_ID'])["RunProperties"]
print("Found the following workflow args: \n{}".format(workflow_args))
return workflow_args
print("Unable to find run properties for this workflow!")
return None
def get_worfklow_param(args: Dict[str, str], arg) -> str:
"""
get_worfklow_param is delegated to verify if the given parameter is present in the job and return it. In case of no presence None will be returned
"""
if args is None:
return None
return args[arg] if arg in args else None
_args = getResolvedOptions(sys.argv, ['JOB_NAME', 'WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
worfklow_params = get_worfklow_params(_args)
job_run_id = get_worfklow_param(_args, "WORKFLOW_RUN_ID")
my_parameter= get_worfklow_param(_args, "WORKFLOW_CUSTOM_PARAMETER")
If you run Glue Job using workflow then sys.argv
(which is a list) will contain parameters --WORKFLOW_NAME
and --WORKFLOW_RUN_ID
in it.如果您使用工作流运行 Glue 作业,则sys.argv
(这是一个列表)将在其中包含参数--WORKFLOW_NAME
和--WORKFLOW_RUN_ID
。 You can use this fact to check if a Glue Job is being run by Workflow or not and then retrieve the Workflow Runtime Properties您可以使用此事实来检查工作流是否正在运行 Glue 作业,然后检索工作流运行时属性
from awsglue.utils import getResolvedOptions
if '--WORKFLOW_NAME' in sys.argv and '--WORKFLOW_RUN_ID' in sys.argv:
glue_args = getResolvedOptions(
sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID']
)
workflow_args = glue_client.get_workflow_run_properties(
Name=glue_args['WORKFLOW_NAME'], RunId=glue_args['WORKFLOW_RUN_ID']
)["RunProperties"]
return {**workflow_args}
else:
raise Exception("GlueJobNotStartedByWorkflow")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.