简体   繁体   中英

A specific DAG stops running in Airflow, even when the scheduler is running fine

My question is about a DAG that stops executing tasks, even though the scheduler appears to be running fine. I am running a remote Airflow server, version 1.8.0 on an Ubuntu EC2 instance. The scheduler is running LocalExecutor . I start the scheduler with airflow scheduler -D and everything seems to be working fine. All jobs execute dependably and externally trigger jobs triggered via an API call are working fine.

My one glaring exception is that I have a job that we use to process site visit data that keeps on choking. Specifically, what happens it that the job says that it is running, but nothing is sent to the queue and no tasks actually run. In the UI, it just gets stuck in a status that look like this:

在此处输入图片说明

A few tasks execute, the rest just get stuck in the queue, and the DAG lists itself a "running" indefinitely. What's puzzling, though, is that when I check system activity with htop the scheduler seems to be actively processing the task queue:

在此处输入图片说明

"Out_of_Aurum_Task_X" indicate a task from the current DAG run. We can see from the task inventory, however, that these tasks are just stuck in "queued" without actually running: 在此处输入图片说明

I have checked /logs and everything looks good. There are no errors or warning popping up in either specific task logs or in airflow-scheduler.log

I'm stumped. What's breaking my DAG and how can I fix it? In case this information is relevant, I am connecting to MySQL DBs via a combination of MySQL hooks from airflow and connections defined within some custom classes using sqlalchemy . If I repeatedly close and restart the scheduler eventually the job completes, but obviously this isn't ideal behavior. Is this a meta data issue? Should I follow the steps here to wipe out and re-start the metaDB?

UPDATE

I started killing ghost processes one at time with kill -9 PID . As I was doing this I noticed that a whole bunch of fresh processes appeared, and indeed the DAG started to run again. Just another piece of information.

Actually, we had similar problems until we updated to 1.9.0. Note that for 1.9 and onwards you will need to pip install "apache-airflow>=1.9.0" (instead of just airflow ). The scheduler has some significant improvements there. I highly recommend you upgrade. You'll need to adjust your logging as logging has changed in that version, but we've found it significantly more stable and the most stable version to date (since 1.6.x).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM