简体   繁体   English

使用 psycopg2 执行大型查询

[英]Executing a large query with psycopg2

I'm trying to execute a large select query (about 50 000 000 from 200 000 000 rows, 15 columns) and fetch all of this data to pandas data frame using psycopg2.我正在尝试执行一个大型选择查询(大约 50 000 000 来自 200 000 000 行,15 列)并使用 psycopg2 将所有这些数据提取到 pandas 数据框。 In pgadmin server status tool i can see, that my query is active for about half an hour and then become idle.在 pgadmin 服务器状态工具中,我可以看到,我的查询处于活动状态大约半小时,然后变为空闲状态。 I read it means that server is waiting for a new command.我读到这意味着服务器正在等待新命令。 On the other hand, my python script still don't have data and it waiting for them too (there is no errors, it looks like data are downloading).另一方面,我的 python 脚本仍然没有数据,它也在等待它们(没有错误,看起来数据正在下载)。

To sum up, database is waiting, python is waiting, should I still waiting?总结一下,数据库在等待,python在等待,我还要等待吗? Is there a chance for happy ending?有没有幸福结局的机会? Or python is not able to process that big amount od data?或者python无法处理大量的od数据?

Holy smokes, Batman!圣烟,蝙蝠侠! If your query takes more than a few minutes to execute, you ought to think of a different way to process your data!如果您的查询执行时间超过几分钟,您应该想出一种不同的方式来处理您的数据! If you are returning 200 000 000 rows of 15 single-byte columns, this is already 3 gigabytes of raw data, assuming not a single byte of overhead, which is very unlikely.如果您返回 200 000 000 行的 15 个单字节列,假设没有一个字节的开销,这已经是 3 GB 的原始数据,这是非常不可能的。 If those columns are 64-bit integers instead, that is already 24 gigabytes.如果这些列是 64 位整数,则已经是 24 GB。 This is a lot of in-memory data to handle for Python.这是要为 Python 处理的大量内存数据。

Have you considered what happens if your process fails during execution, or if the connection is interrupted?您是否考虑过如果您的流程在执行过程中失败,或者连接中断会发生什么? Your program will benefit from processing rows of data in chunks, if it is possible for your process.如果您的进程可能,您的程序将受益于以块的形式处理数据行。 If it really is not possible, consider approaches that operate on the database itself, such as using PL/pgSQL.如果确实不可能,请考虑对数据库本身进行操作的方法,例如使用 PL/pgSQL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM