using the cx_Oracle arraysize parameter with pandas read_sql

Question

When selecting large amounts of data from Oracle using cx_Oracle, I was finding it incredibly slow whether using pandas read_sql or the cx_Oracle Cursor fetchall() method.

Performance is drastically improved by increasing the arraysize attribute of the Cursor - allowing me to get decent performance out of fetchall() . pandas read_sql() takes a Connection object as input and the cursor is created within the function, therefore it's not obvious to me how I can apply that same setting and still take advantage of the read_sql() function. Have I missed something?

Answer 1

When using sqlalchemy to create an engine to connect to the database, you can pass the arraysize argument so it will be used when cursors are created:

import sqlalchemy
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:pass@host:port/dbname", arraysize=50)
pd.read_sql("query ...", engine)

See the docs here: http://docs.sqlalchemy.org/en/rel_1_0/dialects/oracle.html#module-sqlalchemy.dialects.oracle.cx_oracle for "Additional Connect Arguments"

using the cx_Oracle arraysize parameter with pandas read_sql

Question

1 answers

solution1
2 ACCPTED 2015-10-09 09:46:16

using the cx_Oracle arraysize parameter with pandas read_sql

Question

1 answers

solution1 2 ACCPTED 2015-10-09 09:46:16

solution1
2 ACCPTED 2015-10-09 09:46:16