简体   繁体   English

在to_sql之后,Pandas离开了Postgres连接的空闲状态?

[英]Pandas leaving idle Postgres connections open after to_sql?

I am doing a lot of ETL with Pandas and Postgres. 我和Pandas以及Postgres做了很多ETL。 I have a ton of idle connections, many marked with COMMIT and ROLLBACK , that I am not sure how to prevent from sitting as idle for long periods rather than closing. 我有很多空闲连接,很多都标有COMMITROLLBACK ,我不知道如何防止长时间闲置而不是关闭。 The main code I use to write to the database is using pandas to_sql : 我用来写入数据库的主要代码是使用pandas to_sql

def write_data_frame(self, data_frame, table_name):
    engine = create_engine(self.engine_string)
    data_frame.to_sql(name=table_name, con=engine, if_exists='append', index=False)

I know this is definitely not best practice for PostgreSQL and I should be doing something like passing params to a Stored Procedure or Function or something, but this is how we are setup to get data_frames from non-Postgres databases / data sources and upload to Postgres. 我知道这绝对不是PostgreSQL的最佳实践,我应该做的事情就像将params传递给存储过程或函数或其他东西,但这就是我们如何设置从非Postgres数据库/数据源获取data_frames并上传到Postgres 。

My pgAdmin looks like this: 我的pgAdmin看起来像这样:

在此输入图像描述

Can someone please point me in the right direction of how to avoid this many idle connections in the future? 有人可以指出我在未来如何避免这么多空闲连接的正确方向吗? Some of our database connections are meant to be long-lived as they are continuous "batch" processes. 我们的一些数据库连接意味着长期存在,因为它们是连续的“批处理”进程。 But it seems like some one-off events are leaving connections open and idle. 但似乎一些一次性事件正在使连接开放和闲置。

Using the engine as a one-off is probably not ideal for you. engine作为一次性使用可能并不适合您。 If possible, you could make the engine a member of the class and call it as self.engine . 如果可能,您可以使引擎成为类的成员并将其称为self.engine

Another option would be to explicitly dispose of the engine. 另一种选择是明确处理引擎。

def write_data_frame(self, data_frame, table_name):
    engine = create_engine(self.engine_string)
    data_frame.to_sql(name=table_name, con=engine, if_exists='append', index=False)
    engine.dispose()

As noted in the docs , 文档所述

This has the effect of fully closing all currently checked in database connections. 这具有完全关闭所有当前已检入的数据库连接的效果。 Connections that are still checked out will not be closed, however they will no longer be associated with this Engine, so when they are closed individually, eventually the Pool which they are associated with will be garbage collected and they will be closed out fully, if not already closed on checkin. 仍未检出的连接将不会关闭,但是它们将不再与此引擎关联,因此当它们单独关闭时,最终与它们关联的池将被垃圾收集并且它们将完全关闭,如果签到时尚未关闭。

This may also be a good use case for a try...except...finally block since .dispose will only be called when the preceding code executes without error. 这也可能是try...except...finally一个很好的用例try...except...finally块,因为.dispose仅在前面的代码执行时才会被调用而没有错误。

I would much rather be suggesting to you that you pass connections like so: 我宁愿建议你传递这样的连接:

with engine.connect() as connection:
    data_frame.to_sql(..., con=connection)

But the to_sql docs indicate that you can't do that and they will only accept an engine 但是to_sql文档表明你不能这样做,他们只会接受一个engine

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM