简体   繁体   English

Airflow - 如何将 xcom 变量传递给 Python function

[英]Airflow - How to pass xcom variable into Python function

I need to reference a variable that's returned by a BashOperator .我需要引用BashOperator返回的变量。 In my task_archive_s3_file , I need to get the filename from get_s3_file .在我的task_archive_s3_file中,我需要从get_s3_file获取文件名。 The task simply prints {{ ti.xcom_pull(task_ids=submit_file_to_spark) }} as a string instead of the value.该任务只是将{{ ti.xcom_pull(task_ids=submit_file_to_spark) }}打印为字符串而不是值。

If I use the bash_command , the value prints correctly.如果我使用bash_command ,值打印正确。

get_s3_file = PythonOperator(
    task_id='get_s3_file',
    python_callable=obj.func_get_s3_file,
    trigger_rule=TriggerRule.ALL_SUCCESS,
    dag=dag)

submit_file_to_spark = BashOperator(
    task_id='submit_file_to_spark',
    bash_command="echo 'hello world'",
    trigger_rule="all_done",
    xcom_push=True,
    dag=dag)

task_archive_s3_file = PythonOperator(
    task_id='archive_s3_file',
#    bash_command="echo {{ ti.xcom_pull(task_ids='submit_file_to_spark') }}",
    python_callable=obj.func_archive_s3_file,
    params={'s3_path_filename': "{{ ti.xcom_pull(task_ids=submit_file_to_spark) }}" },
    dag=dag)

get_s3_file >> submit_file_to_spark >> task_archive_s3_file

Templates like {{ ti.xcom_pull(...) }} can only be used inside of parameters that support templates or they won't be rendered prior to execution.{{ ti.xcom_pull(...) }}这样的模板只能在支持模板的参数中使用,否则它们在执行之前不会被渲染。 See the template_fields and template_ext attributes of the PythonOperator and BashOperator .请参阅PythonOperatorBashOperatortemplate_fieldstemplate_ext属性。

So templates_dict is what you use to pass templates to your python operator:所以templates_dict是你用来将模板传递给你的 python 操作符的东西:

def func_archive_s3_file(**context):
    archive(context['templates_dict']['s3_path_filename'])

task_archive_s3_file = PythonOperator(
    task_id='archive_s3_file',
    dag=dag,
    python_callable=obj.func_archive_s3_file,
    provide_context=True,  # must pass this because templates_dict gets passed via context
    templates_dict={'s3_path_filename': "{{ ti.xcom_pull(task_ids='submit_file_to_spark') }}" })

However in the case of fetching an XCom value, another alternative is just using the TaskInstance object made available to you via context:但是,在获取 XCom 值的情况下,另一种选择是使用通过上下文提供给您的TaskInstance对象:

def func_archive_s3_file(**context):
    archive(context['ti'].xcom_pull(task_ids='submit_file_to_spark'))

task_archive_s3_file = PythonOperator(
    task_id='archive_s3_file',
    dag=dag,
    python_callable=obj.func_archive_s3_file,
    provide_context=True,

Upvoted both the question and the answer, but I think that this can be made a little more clear for those users who just want to pass small data objects between PythonOperator tasks in their DAGs.对问题和答案都投了赞成票,但我认为对于那些只想在其 DAG 中的PythonOperator任务之间传递小数据对象的用户来说,这可以更清楚一些。 Referencing this question and this XCom example got me to the following solution.参考这个问题和这个 XCom 例子让我得到了以下解决方案。 Super simple:超级简单:

from datetime import datetime
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator

DAG = DAG(
  dag_id='example_dag',
  start_date=datetime.now(),
  schedule_interval='@once'
)

def push_function(**kwargs):
    ls = ['a', 'b', 'c']
    return ls

push_task = PythonOperator(
    task_id='push_task', 
    python_callable=push_function,
    provide_context=True,
    dag=DAG)

def pull_function(**kwargs):
    ti = kwargs['ti']
    ls = ti.xcom_pull(task_ids='push_task')
    print(ls)

pull_task = PythonOperator(
    task_id='pull_task', 
    python_callable=pull_function,
    provide_context=True,
    dag=DAG)

push_task >> pull_task

I'm not sure why this works, but it does.我不确定为什么会这样,但确实如此。 A few questions for the community:社区的几个问题:

  • What's happening with ti here? ti在这里发生了什么? How is that built in to **kwargs ? **kwargs是如何内置的?
  • Is provide_context=True necessary for both functions?这两个函数都需要provide_context=True吗?

Any edits to make this answer clearer are very welcome!非常欢迎任何编辑以使这个答案更清晰!

Used the same code and modified params like Startdate etc.使用相同的代码和修改的参数,如Startdate等。

import airflow
from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator

args = {
    'owner': 'Airflow',
    'start_date': airflow.utils.dates.days_ago(2),
}

DAG = DAG(
  dag_id='simple_xcom',
  default_args=args,
#  start_date=datetime(2019, 04, 21),
  schedule_interval="@daily",
  #schedule_interval=timedelta(1)
)

def push_function(**context):
    msg='the_message'
    print("message to push: '%s'" % msg)
    task_instance = context['task_instance']
    task_instance.xcom_push(key="the_message", value=msg)

push_task = PythonOperator(
    task_id='push_task', 
    python_callable=push_function,
    provide_context=True,
    dag=DAG)

def pull_function(**kwargs):
    ti = kwargs['ti']
    msg = ti.xcom_pull(task_ids='push_task',key='the_message')
    print("received message: '%s'" % msg)

pull_task = PythonOperator(`enter code here`
    task_id='pull_task', 
    python_callable=pull_function,
    provide_context=True,
    dag=DAG)

push_task >> pull_task

If you wonder where does the context['task_instance'] and kwargs['ti'] comes from, you can refer to the Airflow macro documentation如果你想知道context['task_instance']kwargs['ti']来自哪里,你可以参考Airflow 宏文档

In Airflow 2.0 (released December 2020), the TaskFlow API has made passing XComs easier.在 Airflow 2.0(2020 年 12 月发布)中, TaskFlow API使通过 XComs 变得更加容易。 With this API, you can simply return values from functions annotated with @task, and they will be passed as XComs behind the scenes.使用此 API,您可以简单地从带有 @task 注释的函数中返回值,并且它们将在幕后作为 XCom 传递。 Example from the tutorial:教程中的示例:

    @task()
    def extract():
        ...
        return order_data_dict
    
    @task()
    def transform(order_data_dict: dict):
        ...
        return total_order_value

    order_data = extract()
    order_summary = transform(order_data)

In this example, order_data has type XComArg .在此示例中, order_data类型为XComArg It stores the dictionary returned by the extract task.它存储extract任务返回的字典。 When the transform task runs, order_data is unwrapped, and the task receives the plain Python object that was stored.transform任务运行时, order_data被解包,任务接收存储的普通 Python 对象。

If you want to pass an xcom to a bash operator in airflow 2 use env ;如果您想将 xcom 传递给气流 2 中的 bash 运算符,请使用env let's say you have pushed to a xcom my_xcom_var , then you can use jinja inside env to pull the xcom value, eg假设您已推送到 xcom my_xcom_var ,那么您可以在env中使用 jinja 来提取 xcom 值,例如

BashOperator(
    task_id=mytask,
    bash_command="echo ${MYVAR}",
    env={"MYVAR": '{{ ti.xcom_pull(key=\'my_xcom_var\') }}'},
    dag=dag
)

Check https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/bash/index.html#module-airflow.operators.bash for more details检查https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/bash/index.html#module-airflow.operators.bash了解更多详情

The Airflow BaseOperator defines a property output that you can use to access the xcom content of the given operator. Airflow BaseOperator 定义了一个属性output ,您可以使用它来访问给定运算符的xcom内容。 Here is a concrete example这是一个具体的例子

with DAG(...):
    push_task = PythonOperator(
        task_id='push_task', 
        python_callable=lambda: 'Hello, World!')

    pull_task = PythonOperator(
        task_id='pull_task', 
        python_callable=lambda x: print(x),
        op_args=[push_task.output])

which should be almost equivalent to这应该几乎等同于

with DAG(...):
    push_task = PythonOperator(
        task_id='push_task', 
        python_callable=lambda: 'Hello, World!')

    pull_task = PythonOperator(
        task_id='pull_task', 
        python_callable=lambda x: print(x),
        op_args=["{{ task_instance.xcom_pull('push_task') }}"])

As far as I know, the only difference is that the former implicitly defines push_task >> pull_task .据我所知,唯一的区别是前者隐式定义了push_task >> pull_task

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM