简体   繁体   中英

Executing a large query with psycopg2

I'm trying to execute a large select query (about 50 000 000 from 200 000 000 rows, 15 columns) and fetch all of this data to pandas data frame using psycopg2. In pgadmin server status tool i can see, that my query is active for about half an hour and then become idle. I read it means that server is waiting for a new command. On the other hand, my python script still don't have data and it waiting for them too (there is no errors, it looks like data are downloading).

To sum up, database is waiting, python is waiting, should I still waiting? Is there a chance for happy ending? Or python is not able to process that big amount od data?

Holy smokes, Batman! If your query takes more than a few minutes to execute, you ought to think of a different way to process your data! If you are returning 200 000 000 rows of 15 single-byte columns, this is already 3 gigabytes of raw data, assuming not a single byte of overhead, which is very unlikely. If those columns are 64-bit integers instead, that is already 24 gigabytes. This is a lot of in-memory data to handle for Python.

Have you considered what happens if your process fails during execution, or if the connection is interrupted? Your program will benefit from processing rows of data in chunks, if it is possible for your process. If it really is not possible, consider approaches that operate on the database itself, such as using PL/pgSQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM