简体   繁体   中英

Airflow's DAG runs multiple times in one minute, although it was scheduled to run every 5 minutes

I've created a DAG which was scheduled for execution in each 5 minutes using cron syntax. Also, the pool was created for this dag, with single slot only.

I've tried to restart server/scheduler and reset the database. Currently, DAG is running in UTC time. Also, I've tried to set my local timezone, which is 'Europe/Minsk' (UTC+3) - and It gives no effect.

import random
import time
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'pool': 'download',
    # 'priority_weight': 10,
    # 'queue': 'bash_queue',
}

params = {
    'table': 'api_avitoimage',
}

dag = DAG(
    dag_id='test_download_avitoimage',
    default_args=default_args,
    schedule_interval='*/5 * * * *',
)


def sleep_for_a_bit(random_base):
    time.sleep(random_base)

with dag:

    download = BashOperator(
        task_id='download',
        bash_command='/usr/bin/python3 /home/artur/downloader.py --table {{ params.table }}',
        params=params,
        dag=dag)

    sleep = PythonOperator(
        task_id='sleep_for_a_bit',
        python_callable=sleep_for_a_bit,
        op_kwargs={'random_base': random.uniform(0, 1)},
        dag=dag,
    )

    download >> sleep

Issue: the DAG is running ~2-3 times per one minute, which is totally an improper execution. EDITED: It happens that there is 16/16 simultaneously active DAG runs.But I can not understand where this "magic number 16" came from.

By default Airflow tries to complete all "missed" DAGs since start_date . As your start_date is set to airflow.utils.dates.days_ago(2) , Airflow is going to run DAG 576 times before it starts launching DAGs by schedule. You can turn it off by adding catchup = False to your DAG definition (not default_args).

The magic number 16 comes from parameter max_active_runs_per_dag = 16 , which is set by default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM