简体   繁体   English

完美的任务调度

[英]Prefect Task Scheduling

I am new to Prefect, having worked mostly w/ Airflow.我是 Prefect 的新手,主要使用 Airflow 工作。 I have put together a workflow, that executes fine, but the tasks dont execute in the order I expect.我整理了一个工作流程,执行得很好,但任务没有按我期望的顺序执行。 Flow here:流在这里:

with Flow(name='4chan_extract') as flow:
    board_param = Parameter(name='board_name', required = True, default='pol')
    getData(board= board_param)
    checkDB(url= 'postgresql://postgres:user@localhost:5434/postgres')
    upload_raw(url="postgresql://postgres:user@localhost:5434/postgres", 
    board=board_param)
    remove_dupes(board=board_param)

However, when I use flow.visualise() this flow, the DAG looks really odd.但是,当我使用flow.visualise()这个流时,DAG 看起来真的很奇怪。

My understanding is that the context operator with sets order?我的理解是with集合顺序的上下文运算符? using up_stream in each task didn't help.在每个任务中使用up_stream没有帮助。

Any help is appreciated.任何帮助表示赞赏。

If you want your tasks to be called sequentially, one after the other, you can add upstream_tasks to each of your tasks.如果您希望您的任务一个接一个地被顺序调用,您可以将upstream_tasks添加到您的每个任务中。 Additionally, to easily pass state dependencies, you can assign a name to a task when calling it ( data = get_data(board=board_param) ), this allows passing this named reference to downstream dependencies.此外,为了轻松传递 state 依赖项,您可以在调用任务时为其分配一个名称( data = get_data(board=board_param) ),这允许将此命名引用传递给下游依赖项。

I can only guess how you want this flow to look like, but assuming you want it to run sequentially, here is a full example and a DAG visualization:我只能猜测您希望此流程看起来如何,但假设您希望它按顺序运行,这里有一个完整的示例和 DAG 可视化:

from prefect import task, Flow, Parameter


@task
def get_data(board):
    pass


@task
def check_db(url):
    pass


@task
def upload_raw(url, board):
    pass


@task
def remove_duplicates(board):
    pass


with Flow(name="4chan_extract") as flow:
    board_param = Parameter(name="board_name", required=True, default="pol")
    data = get_data(board=board_param)
    check = check_db(
        url="postgresql://postgres:user@localhost:5434/postgres", upstream_tasks=[data]
    )
    upload = upload_raw(
        url="postgresql://postgres:user@localhost:5434/postgres",
        board=board_param,
        upstream_tasks=[check],
    )
    remove_duplicates(board=board_param, upstream_tasks=[upload])

if __name__ == "__main__":
    flow.visualize()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM