简体   繁体   English

在Python的Airflow中,如何在特定时间后停止任务运行?

[英]In Python's Airflow, how can I stop a task from running after a certain time?

I'm trying to use Python's Airflow library. 我正在尝试使用Python的Airflow库。 I want it to scrape a web page periodically. 我希望它定期抓取网页。

The issue I'm having is that if my start_date is several days ago, when I start the scheduler it will backfill from the start_date to today. 我遇到的问题是,如果我的start_date是几天前的话,那么当我启动调度程序时,它将从start_date回填到今天。 For example: 例如:

Assume today is the 20th of the month. 假设今天是每月20号。

Assume the start_date is the 15th of this month. 假设开始start_date是本月15日。

If I start the scheduler on the 20th, it will scrape the page 5 times on the 20th. 如果我在20号启动调度程序,它将在20号刮5次该页面。 It will see that a DAG instance was suppose to run on the 15th, and will run that DAG instance (the one for the 15th) on the 20th. 它将看到一个DAG实例假定在15号运行,并将在20号运行该DAG实例(第15个)。 And then it will run the DAG instance for the 16th on the 20th, etc. 然后它将在20号的16号运行DAG实例,依此类推。

In short, Airflow will try to "catch up", but this doesn't make sense for web scraping. 简而言之,Airflow会尝试“追赶”,但这对于刮网没有意义。

Is there any way to make Airflow consider a DAG instance failed after a certain time? 有什么方法可以让Airflow在一段时间后将DAG实例视为失败?

This feature is in the roadmap for Airflow, but does not currently exist. 此功能在Airflow的路线图中,但当前不存在。

See: Issue #1155 请参阅: 问题#1155

You may be able to hack together a solution using BranchPythonOperator . 您可以使用BranchPythonOperator一起破解一个解决方案。 As it says in the documentation, make sure you have set depends_on_past=False (this is the default). 如文档中所述,请确保已设置depends_on_past=False (这是默认设置)。 I do not have airflow set up so I can't test and provide you example code at this time. 我没有设置气流,因此目前无法测试并提供示例代码。

Airflow was designed with the "backfilling" in mind so the roadmap item is against its primary logic. 气流的设计考虑了“回填”,因此路线图项目违背了其主要逻辑。

For now you can update the start_date for this specific task or the whole dag. 现在您可以更新start_date为这个特定的任务或整个DAG。

Every operator has a start_date http://pythonhosted.org/airflow/code.html#baseoperator 每个操作员都有一个开始日期http://pythonhosted.org/airflow/code.html#baseoperator

The scheduler is not made for being stopped. 调度程序不适合停止。 If you run it today you may set your task start_date to today, seeems logic for me. 如果今天运行它,则可以将任务start_date设置为今天,这对我来说似乎很合理。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何让代码在一段时间后检测到某种音调后停止运行? - How can I make the code stop running after it has detected a certain tone after a set period of time? 如何在一定时间后停止“input()”function? - How can I stop the 'input()' function after a certain amount of time? 如果某个任务正在任务管理器中运行,如何在python中使用pywin或win32com.client进行检查? - How can I check with pywin or win32com.client in python if a certain task is running in task manager? 如何从气流中的 Python Operator 返回列表并将其用作 dags 中后续任务的参数 - How can I return lists from Python Operator in airflow and use it as argument for subsequent task in dags 如何阻止 Airflow 触发我的 python 脚本? - How do I stop Airflow from triggering my python scripts? 如何在 Python 中的某个时间后结束循环中的请求? - How can I end a request in a loop after a certain time in Python? Python:如何在我的第一次调用后(时间序列)阻止函数值更改 - Python: How can I stop my function values from changing after my first call (time series) python:时间库可以做任务:循环应该在一定时间过去后停止吗? - python: can the time library do task: the loop should stop when a certain time passes? 如何停止运行特定Python脚本的进程? - How do i stop process running a certain Python script? 如果在 Python 中满足另一个语句的要求,我可以停止运行 If 语句吗? - Can I stop an If statement from running if another's statement's requirements are met in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM