简体   繁体   English

Airflow dag 中 postgres_operator 的问题

[英]Issues with postgres_operator in Airflow dag

I am currently using Airflow 1.8.2 to schedule some EMR tasks and then execute some long running queries on our Redshift cluster.我目前正在使用 Airflow 1.8.2 来安排一些 EMR 任务,然后在我们的 Redshift 集群上执行一些长时间运行的查询。 For that purpose I am using the postgres_operator.为此,我正在使用 postgres_operator。 The queries take about 30 minutes to run.运行查询大约需要 30 分钟。 However, once they are done, the connection never closes and the operator runs for an hour and a half more till its terminated at the 2 hour mark every time.但是,一旦完成,连接就永远不会关闭,并且操作员会运行一个半小时,直到每次都在 2 小时标记处终止。 The message on termination is that the server closed the connection unexpectedly.终止时的消息是服务器意外关闭了连接。

I've checked the logs on Redshift's end and it shows the queries have run and the connection has been closed.我检查了 Redshift 端的日志,它显示查询已运行并且连接已关闭。 Somehow, that is never communicated back to Airflow.不知何故,这永远不会传达回 Airflow。 Any directions of what more I could check would be helpful.我可以检查的更多内容的任何指示都会有所帮助。 To give some more info, my Airflow installation is an extension of the https://github.com/puckel/docker-airflow docker image, is run in an ECS cluster and has SQLite as backend since I am still testing Airflow out.为了提供更多信息,我的 Airflow 安装是https://github.com/puckel/docker-airflow docker 映像的扩展,在 ECS 集群中运行,并使用 SQLite 作为后端,因为我仍在测试 Airflow。 Also, I'm using the sequential executor for the backend.另外,我在后端使用sequential executor器。 I would appreciate any help in this matter.我将不胜感激在这件事上的任何帮助。

We had similar issue before but I am using SQLAlchemy to Redshift, if you are using postgres_operator, it should be very similar.我们之前遇到过类似的问题,但我使用 SQLAlchemy 到 Redshift,如果您使用的是 postgres_operator,它应该非常相似。 It seems Redshift will close the connection if it doesn't see any activity for a long running query, in your case, 30 mins are pretty long query.如果 Redshift 没有看到长时间运行的查询的任何活动,它似乎会关闭连接,在您的情况下,30 分钟是相当长的查询。

Check https://www.postgresql.org/docs/9.5/static/runtime-config-connection.html you have three settings, tcp_keepalives_idle , tcp_keepalives_idle , tcp_keepalives_count , that sends a live message to redshift to indicate "Hey, I am still alive.检查https://www.postgresql.org/docs/9.5/static/runtime-config-connection.html你有三个设置, tcp_keepalives_idletcp_keepalives_idletcp_keepalives_count ,它向tcp_keepalives_count发送一条实时消息以表明“嘿,我还在活。

You can pass the following as argument, so something like this: connect_args={'keepalives': 1, 'keepalives_idle':60, 'keepalives_interval': 60}您可以将以下内容作为参数传递,如下所示: connect_args={'keepalives': 1, 'keepalives_idle':60, 'keepalives_interval': 60}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM