简体   繁体   English

Redshift + SQLAlchemy长查询挂起

[英]Redshift + SQLAlchemy long query hangs

I'm doing something among the lines of: 我正在做以下事情:

conn_string = "postgresql+pg8000://%s:%s@%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,execution_options={'autocommit':True},encoding='utf-8',isolation_level="AUTOCOMMIT") 
rows = cur.execute(sql_query)

To run queries on a Redshift cluster. 在Redshift集群上运行查询。 Lately, I've been doing maintenance tasks such as running vacuum reindex on large tables that get truncated and reloaded every day. 最近,我一直在执行维护任务,例如在每天都会被截断并重新加载的大表上运行vacuum reindex

The problem is that that command above takes around 7 minutes for a particular table (the table is huge, 60 million rows across 15 columns) and when I run it using the method above it just never finishes and hangs. 问题是上面的命令对于一个特定的表(该表非常大,跨越15列有6000万行)大约需要7分钟,而当我使用上述方法运行它时,它永远不会完成并挂起。 I can see in the cluster dashboard in AWS that parts of the vacuum command are being run for about 5 minutes and then it just stops. 我可以在AWS的群集仪表板中看到,vacuum命令的一部分正在运行大约5分钟,然后它停止了。 No python errors, no errors on the cluster, no nothing. 没有python错误,群集上没有错误,没有任何错误。

My guess is that the connection is lost during the command. 我的猜测是在命令执行期间连接丢失。 So, how do I prove my theory? 那么,如何证明我的理论呢? Anybody else with the issue? 还有其他人的问题吗? What do I change the connection string to keep it alive longer? 我该如何更改连接字符串以使其更长寿?

EDIT: 编辑:

I change my connection this after the comments here: 我在这里发表评论后更改了我的连接:

conn = sqlalchemy.engine.create_engine(conn_string,
                                       execution_options={'autocommit': True},
                                       encoding='utf-8',
                                       connect_args={"keepalives": 1, "keepalives_idle": 60,
                                                             "keepalives_interval": 60},  
                                                        isolation_level="AUTOCOMMIT")

And it has been working for a while. 它已经工作了一段时间。 However, it decided to start with the same behaviour for even larger tables in which the vacuum reindex actually takes around 45 minutes (at least that is my estimate, the command never finishes running in Python). 但是,对于更大的表,它决定从相同的行为开始,在该表中, vacuum reindex实际上需要大约45分钟的时间(至少据我估计,该命令永远不会在Python中运行完毕)。

How can I make this work regardless of the query runtime? 无论查询运行时如何,如何使这项工作有效?

It's most likely not a connection drop issue. 这很可能不是连接断开问题。 To confirm this , try pushing a few million rows into a dummy table (something which takes more than 5 minutes) and see if the statement fails. 要确认这一点,请尝试将几百万行推入虚拟表(这需要5分钟以上的时间),然后查看语句是否失败。 Once a query has been submitted to redshift , regardless of your connection string shutting the query executes in the background. 将查询提交给redshift后,无论您使用什么连接字符串,关闭查询都会在后台执行。

Now, coming to the problem itself - my guess is that you are running out of memory or disk space, can you please be more elaborate and list out your redshift setup (How many nodes of dc1/ds2) ? 现在,问题本身就解决了-我的猜测是您的内存或磁盘空间已用尽,请您能详细说明一下Redshift设置(dc1 / ds2的节点数)吗? Also, try running some admin queries and see how much space you have left on the disk. 另外,尝试运行一些管理查询,以查看磁盘上剩余了多少空间。 Sometimes when the cluster is loaded to the brim a disk full error is thrown but in your case since the connection might be dropped much before the error is thrown to your python shell. 有时,当群集加载到边缘时,会引发磁盘已满错误,但是在您的情况下,因为连接可能在错误被引发到Python Shell之前就已经断开了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM