[英]Airflow - How to pass xcom variable into Python function
I need to reference a variable that's returned by a BashOperator
.我需要引用
BashOperator
返回的变量。 In my task_archive_s3_file
, I need to get the filename from get_s3_file
.在我的
task_archive_s3_file
中,我需要从get_s3_file
获取文件名。 The task simply prints {{ ti.xcom_pull(task_ids=submit_file_to_spark) }}
as a string instead of the value.该任务只是将
{{ ti.xcom_pull(task_ids=submit_file_to_spark) }}
打印为字符串而不是值。
If I use the bash_command
, the value prints correctly.如果我使用
bash_command
,值打印正确。
get_s3_file = PythonOperator(
task_id='get_s3_file',
python_callable=obj.func_get_s3_file,
trigger_rule=TriggerRule.ALL_SUCCESS,
dag=dag)
submit_file_to_spark = BashOperator(
task_id='submit_file_to_spark',
bash_command="echo 'hello world'",
trigger_rule="all_done",
xcom_push=True,
dag=dag)
task_archive_s3_file = PythonOperator(
task_id='archive_s3_file',
# bash_command="echo {{ ti.xcom_pull(task_ids='submit_file_to_spark') }}",
python_callable=obj.func_archive_s3_file,
params={'s3_path_filename': "{{ ti.xcom_pull(task_ids=submit_file_to_spark) }}" },
dag=dag)
get_s3_file >> submit_file_to_spark >> task_archive_s3_file
Templates like {{ ti.xcom_pull(...) }}
can only be used inside of parameters that support templates or they won't be rendered prior to execution.像
{{ ti.xcom_pull(...) }}
这样的模板只能在支持模板的参数中使用,否则它们在执行之前不会被渲染。 See the template_fields
and template_ext
attributes of the PythonOperator and BashOperator .请参阅PythonOperator和BashOperator的
template_fields
和template_ext
属性。
So templates_dict
is what you use to pass templates to your python operator:所以
templates_dict
是你用来将模板传递给你的 python 操作符的东西:
def func_archive_s3_file(**context):
archive(context['templates_dict']['s3_path_filename'])
task_archive_s3_file = PythonOperator(
task_id='archive_s3_file',
dag=dag,
python_callable=obj.func_archive_s3_file,
provide_context=True, # must pass this because templates_dict gets passed via context
templates_dict={'s3_path_filename': "{{ ti.xcom_pull(task_ids='submit_file_to_spark') }}" })
However in the case of fetching an XCom value, another alternative is just using the TaskInstance
object made available to you via context:但是,在获取 XCom 值的情况下,另一种选择是使用通过上下文提供给您的
TaskInstance
对象:
def func_archive_s3_file(**context):
archive(context['ti'].xcom_pull(task_ids='submit_file_to_spark'))
task_archive_s3_file = PythonOperator(
task_id='archive_s3_file',
dag=dag,
python_callable=obj.func_archive_s3_file,
provide_context=True,
Upvoted both the question and the answer, but I think that this can be made a little more clear for those users who just want to pass small data objects between PythonOperator
tasks in their DAGs.对问题和答案都投了赞成票,但我认为对于那些只想在其 DAG 中的
PythonOperator
任务之间传递小数据对象的用户来说,这可以更清楚一些。 Referencing this question and this XCom example got me to the following solution.参考这个问题和这个 XCom 例子让我得到了以下解决方案。 Super simple:
超级简单:
from datetime import datetime
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
DAG = DAG(
dag_id='example_dag',
start_date=datetime.now(),
schedule_interval='@once'
)
def push_function(**kwargs):
ls = ['a', 'b', 'c']
return ls
push_task = PythonOperator(
task_id='push_task',
python_callable=push_function,
provide_context=True,
dag=DAG)
def pull_function(**kwargs):
ti = kwargs['ti']
ls = ti.xcom_pull(task_ids='push_task')
print(ls)
pull_task = PythonOperator(
task_id='pull_task',
python_callable=pull_function,
provide_context=True,
dag=DAG)
push_task >> pull_task
I'm not sure why this works, but it does.我不确定为什么会这样,但确实如此。 A few questions for the community:
社区的几个问题:
ti
here? ti
在这里发生了什么? How is that built in to **kwargs
? **kwargs
是如何内置的?provide_context=True
necessary for both functions?provide_context=True
吗? Any edits to make this answer clearer are very welcome!非常欢迎任何编辑以使这个答案更清晰!
Used the same code and modified params like Startdate
etc.使用相同的代码和修改的参数,如
Startdate
等。
import airflow
from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
args = {
'owner': 'Airflow',
'start_date': airflow.utils.dates.days_ago(2),
}
DAG = DAG(
dag_id='simple_xcom',
default_args=args,
# start_date=datetime(2019, 04, 21),
schedule_interval="@daily",
#schedule_interval=timedelta(1)
)
def push_function(**context):
msg='the_message'
print("message to push: '%s'" % msg)
task_instance = context['task_instance']
task_instance.xcom_push(key="the_message", value=msg)
push_task = PythonOperator(
task_id='push_task',
python_callable=push_function,
provide_context=True,
dag=DAG)
def pull_function(**kwargs):
ti = kwargs['ti']
msg = ti.xcom_pull(task_ids='push_task',key='the_message')
print("received message: '%s'" % msg)
pull_task = PythonOperator(`enter code here`
task_id='pull_task',
python_callable=pull_function,
provide_context=True,
dag=DAG)
push_task >> pull_task
If you wonder where does the context['task_instance']
and kwargs['ti']
comes from, you can refer to the Airflow macro documentation如果你想知道
context['task_instance']
和kwargs['ti']
来自哪里,你可以参考Airflow 宏文档
In Airflow 2.0 (released December 2020), the TaskFlow API has made passing XComs easier.在 Airflow 2.0(2020 年 12 月发布)中, TaskFlow API使通过 XComs 变得更加容易。 With this API, you can simply return values from functions annotated with @task, and they will be passed as XComs behind the scenes.
使用此 API,您可以简单地从带有 @task 注释的函数中返回值,并且它们将在幕后作为 XCom 传递。 Example from the tutorial:
教程中的示例:
@task()
def extract():
...
return order_data_dict
@task()
def transform(order_data_dict: dict):
...
return total_order_value
order_data = extract()
order_summary = transform(order_data)
In this example, order_data
has type XComArg
.在此示例中,
order_data
类型为XComArg
。 It stores the dictionary returned by the extract
task.它存储
extract
任务返回的字典。 When the transform
task runs, order_data
is unwrapped, and the task receives the plain Python object that was stored.当
transform
任务运行时, order_data
被解包,任务接收存储的普通 Python 对象。
If you want to pass an xcom to a bash operator in airflow 2 use env
;如果您想将 xcom 传递给气流 2 中的 bash 运算符,请使用
env
; let's say you have pushed to a xcom my_xcom_var
, then you can use jinja inside env
to pull the xcom value, eg假设您已推送到 xcom
my_xcom_var
,那么您可以在env
中使用 jinja 来提取 xcom 值,例如
BashOperator(
task_id=mytask,
bash_command="echo ${MYVAR}",
env={"MYVAR": '{{ ti.xcom_pull(key=\'my_xcom_var\') }}'},
dag=dag
)
Check https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/bash/index.html#module-airflow.operators.bash for more details检查https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/bash/index.html#module-airflow.operators.bash了解更多详情
The Airflow BaseOperator defines a property output
that you can use to access the xcom
content of the given operator. Airflow BaseOperator 定义了一个属性
output
,您可以使用它来访问给定运算符的xcom
内容。 Here is a concrete example这是一个具体的例子
with DAG(...):
push_task = PythonOperator(
task_id='push_task',
python_callable=lambda: 'Hello, World!')
pull_task = PythonOperator(
task_id='pull_task',
python_callable=lambda x: print(x),
op_args=[push_task.output])
which should be almost equivalent to这应该几乎等同于
with DAG(...):
push_task = PythonOperator(
task_id='push_task',
python_callable=lambda: 'Hello, World!')
pull_task = PythonOperator(
task_id='pull_task',
python_callable=lambda x: print(x),
op_args=["{{ task_instance.xcom_pull('push_task') }}"])
As far as I know, the only difference is that the former implicitly defines push_task >> pull_task
.据我所知,唯一的区别是前者隐式定义了
push_task >> pull_task
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.