简体   繁体   English

Docker 中的气流:如何将 DAG 添加到气流?

[英]Airflow in Docker: how to add DAGs to Airflow?

I want to add DAG files to Airflow, which runs in Docker on Ubuntu.我想将 DAG 文件添加到 Airflow,它在 Ubuntu 上的 Docker 中运行。 I used the following git repository , containing the configuration and link to docker image.我使用了以下git 存储库,其中包含到 docker 映像的配置和链接。 When I run docker run -d -p 8080:8080 puckel/docker-airflow webserver , everything works fin.当我运行docker run -d -p 8080:8080 puckel/docker-airflow webserver ,一切正常。 But I can't find a way to safely add DAGs to Airflow.但是我找不到安全地将 DAG 添加到 Airflow 的方法。 Alternatively, I ran docker run -d -p 8080:8080 puckel/docker-airflow webserver -v /root/dags:/usr/local/airflow/dags , no success either.或者,我运行docker run -d -p 8080:8080 puckel/docker-airflow webserver -v /root/dags:/usr/local/airflow/dags ,也没有成功。

I tried to edit the /config/airflow.cfg and add the git credentials to a repository containing dags, but no success.我尝试编辑/config/airflow.cfg并将 git 凭据添加到包含 dags 的存储库,但没有成功。 Also, added a folder /dags in home/root/dags , containing DAGs, assuming that this folder is shared with the Docker container.此外,在home/root/dags添加了一个文件夹/dags ,其中包含 DAG,假设该文件夹与 Docker 容器共享。 But no success either.但也没有成功。

The Docker composer file contains the following volume settings: Docker Composer 文件包含以下卷设置:

webserver:
        image: puckel/docker-airflow:1.10.0-2
        ...
        volumes:
            - ./dags:/usr/local/airflow/dags 

But when I add stuff to ./dags in the folder from where I run the Docker container, the DAGs don't appear in Airflow.但是,当我向运行 Docker 容器的文件夹中的./dags添加内容时,DAG 不会出现在 Airflow 中。

How can I safely add DAGs to Airflow when it runs in Docker?在 Docker 中运行时,如何安全地将 DAG 添加到 Airflow?

Adding a volume is the correct way添加卷是正确的方法

docker run -d -p 8080:8080 -v /path/to/dags/on/your/local/machine/:/usr/local/airflow/dags  puckel/docker-airflow webserver

A full explanation is described in the following post by Mark Nagelberg Mark Nagelberg以下帖子中描述了完整的解释

By default, on your airflow config you have the following line默认情况下,在您的气流配置中,您有以下行

dags_folder = /usr/local/airflow/dags

This tells airflow to load dags from that folder, in your case that path references inside the container.这告诉气流从该文件夹加载 dag,在您的情况下,路径引用容器内。

Check that the database container is up and running and that airflow initdb was executed.检查数据库容器是否已启动并正在运行,以及是否执行了airflow initdb Airflow uses that metadata database to store the dags is loads. Airflow 使用元数据数据库来存储 dag 是负载。

Airflow scheduler loads dags every heartbeat as far as I know, so make sure you have a decent execution interval for it:据我所知,Airflow 调度程序每次心跳都会加载 dag,因此请确保您有合适的执行间隔:

Also, in your airflow.cfg (in seconds):此外,在您的airflow.cfg(以秒为单位)中:

scheduler_heartbeat_sec = 5

It might also be helpful to check the airflow logs inside the container for proper insights.检查容器内的气流日志以获得正确的见解也可能会有所帮助。 You can run from your shell:您可以从 shell 运行:

docker logs [container-id | container-name]

Hope this gave you some insights about your problem.希望这能让您对您的问题有所了解。

I've been using airflow in docker for a while and the load and reloading of code is still a bit buggy.我已经在 docker 中使用了气流一段时间了,代码的加载和重新加载仍然有点问题。 The best solution for me is everytime I add a new dag or modify code of a dag is just to restart the whole project ( docker-compose up -d --buid ) so the webserver, scheduler and workers are up-to-date.对我来说最好的解决方案是每次我添加新的 dag 或修改 dag 的代码时只是重新启动整个项目( docker-compose up -d --buid ),以便网络服务器、调度程序和工作人员是最新的。

My docker + airflow worked well.我的 docker + 气流运行良好。 Every dags added can test & run smoothly.添加的每个 dag 都可以测试并顺利运行。

The approaches are: 1. expose whole volume of airflow instead of dags folder only.方法是: 1. 暴露整个气流,而不是只暴露 dags 文件夹。

webserver:
        image: puckel/docker-airflow:1.10.0-2
        ...
        volumes:
            - ./airflow:/usr/local/airflow
  1. edit the dags folder configuration in the airflow configuration file(it do not needs edit by default, as it is under the airflow folder)编辑气流配置文件中的dags文件夹配置(默认不需要编辑,因为它在气流文件夹下)
  2. every time, check if the dag name appeared by following command:每次,通过以下命令检查 dag 名称是否出现:

    airflow list_dags气流列表_dags

if not, pls double check the new added dag python file.如果没有,请仔细检查新添加的 dag python 文件。 note, above command can check the dag file immediately.注意,上面的命令可以立即检查 dag 文件。 the airflow web usually delay several seconds to minutes due to configuration or system loading.由于配置或系统负载,气流网通常会延迟几秒到几分钟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM