Python psycopg2 cursors

Question

From psycopg2 documentation:

When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client. If the dataset is too large to be practically handled on the client side, it is possible to create a server side cursor.

I would like to query a table with possibly thousands of rows and do some action for each one. Will normal cursors actually bring the entire data set on the client? That doesn't sound very reasonable. The code is something along the lines of:

conn = psycopg2.connect(url)
cursor = conn.cursor()
cursor.execute(sql)
for row in cursor:
    do some stuff
cursor.close()

I would expect this to be a streaming operation. And a second question is regarding the scope of cursors. Inside my loop I would like to do an update of another table. Do I need to open a new cursor and close every time? Each item update should be in its own transaction as I might need to do a rollback.

for row in cursor:
    anotherCursor = anotherConn.cursor()
    anotherCursor.execute(update)
    if somecondition:
        anotherConn.commit()
    else:
        anotherConn.rollback
cursor.close()

======== EDIT: MY ANSWER TO FIRST PART ========

Ok, I will try to answer the first part of my question. The normal cursors actually bring the entire data set as soon as you call execute, before even starting to iterate the result set. You can verify that by checking the memory footprint of the process at each step. But the need for a server side cursor is actually due to postgres server and not the client, and is documented here: http://www.postgresql.org/docs/9.3/static/sql-declare.html

Now, this is not immediately apparent from the documentation, but such cursors can actually be temporarily created for the duration of the transaction. There is no need to explicitly create a function that returns a refcursor in the database, with the specific SLQ statement, etc. With psycopg2 you only need to give a name while obtaining the cursor and a temporary cursor will be created for that transaction. So instead of:

 cursor = conn.cursor()

you just need to to:

 cursor = conn.cursor('mycursor')

That's it and it works. I assume the same thing is done under the covers when using JDBC, when setting fetchSize. It's just a bit more transparent. See docs here: https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

You can test that this works by querying the pg_cursors view inside the same transaction. The server side cursor appears after obtaining the client side cursor and disappears after closing the client side cursor. So bottom line: I'm happy to do that change to my code, but I must say this was a big gotcha for someone not that experienced with postgres.

Answer 1

Actually, you have already answered the question ;).

Yes you should use server side cursor to get records streamed http://initd.org/psycopg/docs/usage.html#server-side-cursors

From docs:

CREATE FUNCTION reffunc(refcursor) RETURNS refcursor AS $$
BEGIN
    OPEN $1 FOR SELECT col FROM test;
    RETURN $1;
END;
$$ LANGUAGE plpgsql;

And in code:

cur1 = conn.cursor()
cur1.callproc('reffunc', ['curname'])

cur2 = conn.cursor('curname')
for record in cur2:     # or cur2.fetchone, fetchmany...
    # do something with record
    pass

Yes you should open new cursor, if you wanna get rows with server side cursor.

Python psycopg2 cursors

Question

1 answers

solution1
3 2015-05-24 19:46:31

Python psycopg2 cursors

Question

1 answers

solution1 3 2015-05-24 19:46:31

solution1
3 2015-05-24 19:46:31