[英]Prefect Task Scheduling
I am new to Prefect, having worked mostly w/ Airflow.我是 Prefect 的新手,主要使用 Airflow 工作。 I have put together a workflow, that executes fine, but the tasks dont execute in the order I expect.我整理了一个工作流程,执行得很好,但任务没有按我期望的顺序执行。 Flow here:流在这里:
with Flow(name='4chan_extract') as flow:
board_param = Parameter(name='board_name', required = True, default='pol')
getData(board= board_param)
checkDB(url= 'postgresql://postgres:user@localhost:5434/postgres')
upload_raw(url="postgresql://postgres:user@localhost:5434/postgres",
board=board_param)
remove_dupes(board=board_param)
However, when I use flow.visualise()
this flow, the DAG looks really odd.但是,当我使用flow.visualise()
这个流时,DAG 看起来真的很奇怪。
My understanding is that the context operator with
sets order?我的理解是with
集合顺序的上下文运算符? using up_stream
in each task didn't help.在每个任务中使用up_stream
没有帮助。
Any help is appreciated.任何帮助表示赞赏。
If you want your tasks to be called sequentially, one after the other, you can add upstream_tasks
to each of your tasks.如果您希望您的任务一个接一个地被顺序调用,您可以将upstream_tasks
添加到您的每个任务中。 Additionally, to easily pass state dependencies, you can assign a name to a task when calling it ( data = get_data(board=board_param)
), this allows passing this named reference to downstream dependencies.此外,为了轻松传递 state 依赖项,您可以在调用任务时为其分配一个名称( data = get_data(board=board_param)
),这允许将此命名引用传递给下游依赖项。
I can only guess how you want this flow to look like, but assuming you want it to run sequentially, here is a full example and a DAG visualization:我只能猜测您希望此流程看起来如何,但假设您希望它按顺序运行,这里有一个完整的示例和 DAG 可视化:
from prefect import task, Flow, Parameter
@task
def get_data(board):
pass
@task
def check_db(url):
pass
@task
def upload_raw(url, board):
pass
@task
def remove_duplicates(board):
pass
with Flow(name="4chan_extract") as flow:
board_param = Parameter(name="board_name", required=True, default="pol")
data = get_data(board=board_param)
check = check_db(
url="postgresql://postgres:user@localhost:5434/postgres", upstream_tasks=[data]
)
upload = upload_raw(
url="postgresql://postgres:user@localhost:5434/postgres",
board=board_param,
upstream_tasks=[check],
)
remove_duplicates(board=board_param, upstream_tasks=[upload])
if __name__ == "__main__":
flow.visualize()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.