简体   繁体   English

如何在 airflow 中将变量从一个任务传递到另一个任务

[英]How to pass a variable from one task to another in airflow

The below code works but my requirement is to pass totalbuckets as an input to the function as opposed to global variable.下面的代码有效,但我的要求是将 totalbuckets 作为输入传递给 function 而不是全局变量。 I am having trouble passing it as a variable and do xcom_pull in next task.我无法将其作为变量传递并在下一个任务中执行 xcom_pull。 This dag basically creates buckets based on the number of inputs and totalbuckets is a constant.这个 dag 基本上根据输入的数量创建桶,totalbuckets 是一个常数。 Appreciate your help in advance.提前感谢您的帮助。

from airflow import DAG    
from airflow.operators.python import PythonOperator, BranchPythonOperator   
with DAG('test-live', catchup=False, schedule_interval=None, default_args=args) as test_live:


totalbuckets = 3


# branches based on number of buckets
    def branch_buckets(**context):

        buckets = defaultdict(list)
        for i in range(len(inputs_to_process)):
            buckets[f'bucket_{(1+i % totalbuckets)}'].append(inputs_to_process[i])
      
        for bucket_name, input_sublist in buckets.items():
            context['ti'].xcom_push(key = bucket_name, value = input_sublist)
        return list(buckets.keys())
    
    # BranchPythonOperator will launch the buckets and distributes inputs among the buckets
    branch_buckets = BranchPythonOperator(
        task_id='branch_buckets',
        python_callable=branch_buckets,
        trigger_rule=TriggerRule.NONE_FAILED,
        provide_context=True,
        dag=test_live
    )  
# update provider tables with merge sql
    def update_inputs(sf_conn_id, bucket_name, **context):
        input_sublist = context['ti'].xcom_pull(task_ids='branch_buckets', key=bucket_name)
        print(f"Processing inputs {input_sublist} in {bucket_name}")

        from custom.hooks.snowflake_hook import SnowflakeHook
        for p in input_sublist:
            merge_sql=f"""
            merge into ......"""

bucket_tasks = []
        for i in range(totalbuckets):
            task= PythonOperator(
                task_id=f'bucket_{i+1}',
                python_callable=update_inputs,
                provide_context=True,
                op_kwargs={'bucket_name':f'bucket_{i+1}','sf_conn_id': SF_CONN_ID},
                dag=test_live
            )
            bucket_tasks.append(task)

If totalbuckets is different from run to other, it should be a run conf variable, you can provide it for each run crated from the UI, CLI, Airflow REST API or even python API.如果totalbuckets与其他运行不同,它应该是一个运行 conf 变量,您可以为从 UI、CLI、Airflow REST API 甚至 python API 创建的每次运行提供它。

from airflow import DAG
from airflow.operators.python import PythonOperator, BranchPythonOperator
from airflow.models.param import Param
with DAG(
    'test-live',
    catchup=False,
    schedule_interval=None,
    default_args=args,
    params={"totalbuckets": Param(default=3, type="integer")},
) as test_live:
    # branches based on number of buckets
    def branch_buckets(**context):

        buckets = defaultdict(list)
        for i in range(len(inputs_to_process)):
            buckets[f'bucket_{(1+i % int("{{ params.totalbuckets }}"))}'].append(inputs_to_process[i])

        for bucket_name, input_sublist in buckets.items():
            context['ti'].xcom_push(key = bucket_name, value = input_sublist)
        return list(buckets.keys())

    # BranchPythonOperator will launch the buckets and distributes inputs among the buckets
    branch_buckets = BranchPythonOperator(
        task_id='branch_buckets',
        python_callable=branch_buckets,
        trigger_rule=TriggerRule.NONE_FAILED,
        provide_context=True,
        dag=test_live
    )
    # update provider tables with merge sql
    def update_inputs(sf_conn_id, bucket_name, **context):
        input_sublist = context['ti'].xcom_pull(task_ids='branch_buckets', key=bucket_name)
        print(f"Processing inputs {input_sublist} in {bucket_name}")

        from custom.hooks.snowflake_hook import SnowflakeHook
        for p in input_sublist:
            merge_sql=f"""
                merge into ......"""

    bucket_tasks = []
    for i in range(int("{{ params.totalbuckets }}")):
        task= PythonOperator(
            task_id=f'bucket_{i+1}',
            python_callable=update_inputs,
            provide_context=True,
            op_kwargs={'bucket_name':f'bucket_{i+1}','sf_conn_id': SF_CONN_ID},
            dag=test_live
        )
        bucket_tasks.append(task)

Example to run it:运行它的示例:

airflow dags trigger --conf '{"totalbuckets": 10}' test-live

Or via the UI .或者通过UI

update:更新:

And if it's static, but different from an environment to other, it can be an Airflow variable , and read it directly in the tasks using jinja to avoid reading it at each Dag Files processing.如果它是 static,但与其他环境不同,它可以是一个Airflow 变量,并在使用 jinja 的任务中直接读取它,以避免在每个 Dag 文件处理时读取它。

But if it's completely static, the most recommended solution is using python variable as you do, because to read dag run conf and Airflow variables, the task/dag send a query to the database.但如果它完全是 static,最推荐的解决方案是像您一样使用 python 变量,因为要读取 dag run conf 和 Airflow 变量,task/dag 会向数据库发送查询。

@hussein awala I am doing something like below but cannot parse totalbuckets in bucket_tasks @hussein awala 我正在做类似下面的事情,但无法解析 bucket_tasks 中的 totalbuckets

from airflow.operators.python import PythonOperator, BranchPythonOperator   
with DAG('test-live', catchup=False, schedule_interval=None, default_args=args) as test_live:


#totalbuckets = 3


    def branch_buckets(totalbuckets, **context):

        buckets = defaultdict(list)
        for i in range(len(inputs_to_process)):
            buckets[f'bucket_{(1+i % totalbuckets)}'].append(inputs_to_process[i])
      
        for bucket_name, input_sublist in buckets.items():
            context['ti'].xcom_push(key = bucket_name, value = input_sublist)
        return list(buckets.keys())
    
    # BranchPythonOperator will launch the buckets and distributes inputs among the buckets
    branch_buckets = BranchPythonOperator(
        task_id='branch_buckets',
        python_callable=branch_buckets,
        trigger_rule=TriggerRule.NONE_FAILED,
        provide_context=True, op_kwargs={'totalbuckets':3},
        dag=test_live
    )  
# update provider tables with merge sql
    def update_inputs(sf_conn_id, bucket_name, **context):
        input_sublist = context['ti'].xcom_pull(task_ids='branch_buckets', key=bucket_name)
        print(f"Processing inputs {input_sublist} in {bucket_name}")

        from custom.hooks.snowflake_hook import SnowflakeHook
        for p in input_sublist:
            merge_sql=f"""
            merge into ......"""

bucket_tasks = []
        for i in range(totalbuckets):
            task= PythonOperator(
                task_id=f'bucket_{i+1}',
                python_callable=update_inputs,
                provide_context=True,
                op_kwargs={'bucket_name':f'bucket_{i+1}','sf_conn_id': SF_CONN_ID},
                dag=test_live
            )
            bucket_tasks.append(task)```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将XCom消息从PythonOperator任务传递到Airflow中的SparkSubmitOperator任务 - How to pass XCom message from PythonOperator task to a SparkSubmitOperator task in Airflow 如何将变量 task_ids 传递给 Airflow 中的 xcom.pull? - How to pass a variable task_ids to a xcom.pull in Airflow? 气流-如何使用另一个传感器任务中的传感器 - Airflow- how to use sensor from another sensor task 如何从 Airflow 中的另一个任务动态初始化任务? - How to dynamically initialize tasks from another task in Airflow? 如何将Python Fabric任务变量传递给另一个函数或任务 - How to pass Python Fabric Task Variable into another Function or Task 如何将pythoncard中的变量从一个文件传递到另一个文件? - How to Pass variable in pythoncard from one file to another? 如何将变量从一个 Python 脚本传递到另一个? - How can I pass a variable from one Python Script to another? 如何将变量从一个 class 传递到 Python 中的另一个变量? - How can I pass a variable from one class to another in Python? 如何将字符串变量从一个 function 传递给另一个? - How to pass a string variable from one function to another? Flask Python - 如何将变量从一种方法传递到另一种方法 - Flask Python - How to pass variable from one method to another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM