简体   繁体   English

DAG 在 Google Cloud Composer 网络服务器上不可点击,但在本地 Airflow 上工作正常

[英]DAGs not clickable on Google Cloud Composer webserver, but working fine on a local Airflow

I'm using Google Cloud Composer (managed Airflow on Google Cloud Platform) with image version composer-0.5.3-airflow-1.9.0 and Python 2.7, and I'm facing a weird issue : after importing my DAGs, they are not clickable from the Web UI (and there are no buttons "Trigger DAG", "Graph view", ...), while all works perfectly when running a local Airflow.我正在使用带有图像版本composer-0.5.3-airflow-1.9.0和 Python 2.7 的Google Cloud Composer (Google Cloud Platform 上的托管 Airflow),但我面临一个奇怪的问题:导入 DAG 后,它们不是可从 Web UI单击(并且没有“触发 DAG”、“图形视图”等按钮),而在运行本地 Airflow 时一切正常。

Even if non usable from the webserver on Composer, my DAGs still exist.即使无法从 Composer 上的网络服务器使用,我的 DAG 仍然存在。 I can list them using CLI ( list_dags ), describe them ( list_tasks ) and even trigger them ( trigger_dag ).我可以使用 CLI ( list_dags ) 列出它们,描述它们 ( list_tasks ) 甚至触发它们 ( trigger_dag )。

Minimal example reproducing the issue重现问题的最小示例

A minimal example I used to reproduce the issue is shown below.我用来重现该问题的最小示例如下所示。 Using a hook (here, GoogleCloudStorageHook ) is very important, since the bug on Composer happens when a hook is used.使用钩子(这里是GoogleCloudStorageHook )非常重要,因为 Composer 上的错误发生在使用钩子时。 Initially, I was using a custom hook (in a custom plugin), and was facing the same issue.最初,我使用的是自定义钩子(在自定义插件中),并且遇到了同样的问题。

Basically here, the example lists all entries in a GCS bucket ( my-bucket ) and generate a DAG for each entry beginning with my_dag .基本上在这里,该示例列出了 GCS 存储桶 ( my-bucket ) 中的所有条目,并为每个以my_dag开头的条目生成一个 DAG。

import datetime

from airflow import DAG
from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook
from airflow.operators.bash_operator import BashOperator

google_conn_id = 'google_cloud_default'

gcs_conn = GoogleCloudStorageHook(google_conn_id)

bucket = 'my-bucket'
prefix = 'my_dag'

entries = gcs_conn.list(bucket, prefix=prefix)

for entry in entries:
    dag_id = str(entry)

    dag = DAG(
        dag_id=dag_id,
        start_date=datetime.datetime.today(),
        schedule_interval='0 0 1 * *'
    )

    op = BashOperator(
        task_id='test',
        bash_command='exit 0',
        dag=dag
    )

    globals()[dag_id] = dag

Results on Cloud Composer Cloud Composer 上的结果

After importing this file to Composer, here's the result (I have 4 files beginning with my_dag in my-bucket ) :将此文件导入 Composer 后,结果如下(我在my-bucket有 4 个以my_dag开头的文件):

Google Cloud Composer 上的 DAG

As I explained, DAGs are not clickable and the columns "Recent Tasks" and "DAG Runs" are loading forever.正如我所解释的,DAG 不可点击,并且“Recent Tasks”和“DAG Runs”列永远加载。 The "info" mark next to each DAG name says : This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database每个 DAG 名称旁边的“信息”标记表示: This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database . This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database

Of course, refreshing is not useful, and when accessing the DAG Graph View by the direct URL ( https://****.appspot.com/admin/airflow/graph?dag_id=my_dag_1 ), it shows an error : DAG "my_dag_1" seems to be missing.当然,刷新是没有用的,通过直接URL( https://****.appspot.com/admin/airflow/graph?dag_id=my_dag_1 )访问DAG Graph View时,显示错误: DAG "my_dag_1" seems to be missing.

Results on local Airflow局部气流的结果

When importing the script on a local Airflow, the webserver works fine :在本地 Airflow 上导入脚本时,网络服务器工作正常:

本地 Airflow 上的 DAG

Some tests一些测试

If I replace the line entries = gcs_conn.list(bucket, prefix=prefix) with hard-coded values like entries = [u'my_dag_1', u'my_dag_2', u'my_dag_3', u'my_dag_4'] , then DAGs are clickable on Composer Web UI (and all buttons on "links" columns appear).如果我用entries = [u'my_dag_1', u'my_dag_2', u'my_dag_3', u'my_dag_4']类的硬编码值替换行entries = gcs_conn.list(bucket, prefix=prefix) ,则DAG 是可在 Composer Web UI 上单击(并显示“链接”列上的所有按钮)。 It seems that, from other tests I have made on my initial problem, calling a method from a hook (not just initializing the hook) causes the issue.看来,从我对初始问题所做的其他测试来看,从钩子调用方法(不仅仅是初始化钩子)会导致问题。 Of course, DAGs in Composer work normally on simple examples (no hooks method calls involved).当然,Composer 中的 DAG 在简单示例上正常工作(不涉及钩子方法调用)。

I have no idea why this happened, I have also inspected the logs (by setting logging_level = DEBUG in airflow.cfg ) but could not see something wrong.我不知道为什么会发生这种情况,我还检查了日志(通过在airflow.cfg设置logging_level = DEBUG ),但看不出有什么问题。 I'm suspecting a bug in the webserver, but I cannot get a significant stack trace.我怀疑网络服务器中存在错误,但我无法获得重要的堆栈跟踪。 Webserver logs from Composer (hosted on App Engine) are not available, or at least I did not find a way to access them.来自 Composer(托管在 App Engine 上)的 Web 服务器日志不可用,或者至少我没有找到访问它们的方法。

Did someone experienced the same issue or similar ones with Composer Web UI ?有人在 Composer Web UI 上遇到过同样的问题或类似的问题吗? I think the problem is coming from the usage of hooks, but I may be wrong.我认为问题出在钩子的使用上,但我可能错了。 It can just be a side effect.它可能只是一个副作用。 To be honest, I am lost after testing so many things.老实说,我测试了这么多东西后迷失了。 I'll be glad if someone can help me.如果有人可以帮助我,我会很高兴。 Thanks!谢谢!

Update更新

When deploying a self-managed webserver on Kubernetes following this guide : https://cloud.google.com/composer/docs/how-to/managing/deploy-webserver , my DAGs are clickable from this self-managed webserver.当按照本指南在 Kubernetes 上部署自我管理的网络服务器时: https ://cloud.google.com/composer/docs/how-to/managing/deploy-webserver ,我的 DAG 可以从这个自我管理的网络服务器点击。

The Composer webserver runs with a different service account than the nodes in the Composer GKE cluster. Composer 网络服务器使用与 Composer GKE 集群中的节点不同的服务帐户运行。 You should make sure you have assigned the appropriate role/permissions to your webserver's service account.您应该确保已为您的网络服务器的服务帐户分配了适当的角色/权限。

Eg if your webserver's url is:例如,如果您的网络服务器的 url 是:

foo-tp.appspot.com

then the service account is:那么服务帐户是:

foo-tp@appspot.gserviceaccount.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 DAG成功运行,但在Airflow Web服务器UI中DAG不可用/ DAG在Google Cloud Composer中不可单击 - DAG runs successfully but in Airflow Webserver UI DAG isn't available/DAG isn't clickable in Google Cloud Composer 谷歌云作曲家获得 airflow webserver_id - Google cloud composer get airflow webserver_id GCP Cloud Composer:将 Dag 的状态从 Apache Airflow 审计到 BigQuery - GCP Cloud Composer : Audit the status of the Dags from Apache Airflow into BigQuery 在GCP Composer上创建气流DAG - Creating Airflow DAGs on GCP Composer OAuth apache airflow 中的身份验证(Google Cloud Composer) - OAuth authentication in apache airflow (Google Cloud Composer) 气流网络服务器为无作为计划间隔的 dag 提供 cron 错误 - Airflow webserver gives cron error for dags with None as schedule interval Google Cloud Composer,气流作业无法识别已安装的PyPi软件包 - Google Cloud Composer, airflow job cannot recognize installed PyPi packages 在Google Cloud Composer中的气流DAG中获取导入错误 - getting error of import in airflow DAG in google cloud composer 动态创建的任务/任务在apache气流中不起作用 - Dynamically created tasks/dags are not working in apache airflow airflow hide_paused_dags_by_default onairflow.cfg 不起作用 - Airflow hide_paused_dags_by_default on airflow.cfg is not working
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM