简体   繁体   English

Apache Airflow Xcom 从动态任务名称中提取

[英]Apache Airflow Xcom Pull from dynamic task name

I have successfully created dynamic tasks in a DAG (Bash and Docker Operators) but I'm having a hard time passing those dynamically created tasks to xcom_pull to grab data.我已经在 DAG(Bash 和 Docker 操作员)中成功创建了动态任务,但是我很难将这些动态创建的任务传递给 xcom_pull 以获取数据。

for i in range(0, max_tasks):
    task_scp_queue = BashOperator(task_id="scp_queue_task_{}".format(i), bash_command="""python foo""", retries=3, dag=dag, pool="scp_queue_pool", queue="foo", provide_context=True, xcom_push=True) # Pull the manifest ID from the previous task via xcom'

    task_process_queue = DockerOperator(task_id="process_task_{}".format(i), command="""python foo --queue-name={{ task_instance.xcom_pull(task_ids=scp_queue_task_{}) }}""".format(i), retries=3, dag=dag, pool="process_pool", api_version="auto", image="foo", queue="foo", execution_timeout=timedelta(minutes=5))
    task_manifest = DockerOperator(api_version="auto", task_id="manifest_task_{}".format(i), image="foo", retries=3, dag=dag, command=""" python --manifestid={{ task_instance.xcom_pull(task_ids=scp_queue_task_{}) }}""".format(i), pool="manfiest_pool", queue="d_parser")

    task_psql_queue.set_downstream(task_scp_queue)
    task_process_queue.set_upstream(task_scp_queue)
    task_manifest.set_upstream(task_process_queue)

As you can see I tried just using Python format string in the Jinja template to pass the i variable in it, however that doesn't work.如您所见,我尝试在 Jinja 模板中仅使用 Python 格式字符串来传递 i 变量,但这不起作用。

I've also tried using "task.task_id", and creating a new string with just the task_id but that doesn't work either.我也试过使用“task.task_id”,并只用 task_id 创建一个新字符串,但这也不起作用。

Edit:编辑:

Now command looks like this现在命令看起来像这样

command="""python foo \ 
    --queue-name="{{ 
    task_instance.xcom_pull(task_ids='scp_queue_task_{}') }}" 
     """.format(i)

And my debug logs from Airflow look like我来自 Airflow 的调试日志看起来像

Using Master Queue: process_{ 
task_instance.xcom_pull(task_ids='scp_queue_task_31') }

So the string value is being populated but it's not executing the xcom_pull.所以正在填充字符串值,但它没有执行 xcom_pull。

I'm confused how this isn't working.我很困惑这怎么行不通。 A log of the errors you're getting would be helpful.您收到的错误日志会有所帮助。

In brief, what you're doing looks good, if max_tasks=2 you will get:简而言之,你在做什么看起来不错,如果max_tasks=2你会得到:

task_psql_queue.taskid --> scp_queue_task_0 >> process_task_0 >> manifest_task_0
                       \-> scp_queue_task_1 >> process_task_1 >> manifest_task_1

I suspect you don't need the timeouts, which are really short.我怀疑你不需要超时,这真的很短。 Because you have very long lines and randomly reorder your named params I'll reformat what you wrote:因为你有很长的行并且随机重新排序你的命名参数我会重新格式化你写的内容:

for i in range(0, max_tasks):
    task_scp_queue = BashOperator(
        task_id="scp_queue_task_{}".format(i),
        dag=dag,
        retries=3,  # you could make it a default arg on the dag
        pool="scp_queue_pool",
        queue="foo", # you really want both queue and pool? When debugging remove them.
        bash_command="python foo",  # Maybe you snipped a multiline command
        provide_context=True,  # BashOp doesn't have this argument
        xcom_push=True,  # PUSH the manifest ID FOR the NEXT task via xcom
    )

    task_process_queue = DockerOperator(
        task_id="process_task_{}".format(i),
        dag=dag,
        retries=3,
        pool="process_pool",
        queue="foo",
        execution_timeout=timedelta(minutes=5),
        api_version="auto",
        image="foo",
        command="python foo --queue-name="
                "{{{{ task_instance.xcom_pull(task_ids=scp_queue_task_{}) }}}}".format(i),
    )

    task_manifest = DockerOperator(
        task_id="manifest_task_{}".format(i),
        retries=3,
        dag=dag,
        pool="manfiest_pool",
        queue="d_parser",
        api_version="auto",
        image="foo",
        command="python --manifestid="
                "{{{{ task_instance.xcom_pull(task_ids=scp_queue_task_{}) }}}}".format(i),
    )

    task_psql_queue >> task_scp_queue >> task_process_queue >> task_manifest

Oh, now look, you didn't pass the task_ids as strings.哦,现在看,您没有将task_ids作为字符串传递。 Try:尝试:

        command="python foo --queue-name="
                "{{{{ task_instance.xcom_pull(task_ids='scp_queue_task_{}') }}}}".format(i),
… … …
        command="python --manifestid="
                "{{{{ task_instance.xcom_pull(task_ids='scp_queue_task_{}') }}}}".format(i),

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM