简体   繁体   中英

How do I stop Airflow from triggering my python scripts?

I am currently using Airflow to run a DAG (say dag.py) which has a few tasks, and then, it has a python script to execute (done via bash_operator). The python script (say report.py) basically takes data from a cloud (s3) location as a dataframe, does a few transformations, and then sends them out as a report over email.

But the issue I'm having is that airflow is basically running this python script, report.py, everytime Airflow scans the repository for changes (ie every 2 mins). So, the script is being run every 2 mins (and hence the email is being sent out every two minutes.).

Is there any work around to this? Can we use something apart from a bash operator (bare in mind that we need to do a few dataframe transformations before sending out the report)?

Thanks!

Just make sure you do everything serious in the tasks. It in the python script. The script will be executed often by scheduler but it should simply create tasks and build dependencies between them. The actual work is done in the 'execute' methods of the tasks.

For example rather than sending email in the script you should add the 'EmailOperator' as a task and the right dependencies, so the execute method of the operator will be executed not when the file is parsed by scheduler, but when all dependencies (other tasks ) will complete

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM