简体   繁体   中英

Using Python/PyGreSQL, how can I efficiently handle a large result set?

I have a query result set of ~ 9-million rows.

I need to do some processing for each row, and the code currently does this:

query = conn.query(sql)
results = query.getresult()

for row in results: 
    # blah

I'm not sure, but I imagine that getresult() is pulling down the entire result set. Is that the case? I imagine there's a way to only pull chunks of the result set across the wire as needed, but I didn't immediately see something like that in the pg module docs.

Is it possible to do this with pgdb module instead, or some other approach?

My concerns are for memory on the application machine - I'd rather not load millions of rows into memory all at once if I can help it.

Is this even worth worrying about?

If it's following the Python Database API spec , you could use a cursor:

curs = conn.cursor()
curs.execute('select * from bigtable')

then use curs.fetchone() or curs.fetchmany(chunksize)

pgdb 's cursors are iterators

cursor = conn.cursor()
cursor.execute(sql)

for row in cursor:
   # do something with row

where conn is created from pgdb.connect(...)

I'm not sure how getresult() behaves but another option would be PL/Python :

The PL/Python procedural language allows PostgreSQL functions to be written in the Python language.

That would let you work right inside the database. This might not be suitable for what you need to do but it is worth a look.

use cursor.fetchmany() and make sure you explicitly set arraysize to handle sets of rows that gives you the balance you need between performance and memory utilization.

I have jobs written in cx_Oracle (which also uses the DB-API spec) and use it to move tables with several billion rows across the network in batches of 20,000 records. It takes a while, but I'm not blowing out my server memory on either the source or target side.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM