简体   繁体   中英

using the cx_Oracle arraysize parameter with pandas read_sql

When selecting large amounts of data from Oracle using cx_Oracle, I was finding it incredibly slow whether using pandas read_sql or the cx_Oracle Cursor fetchall() method.

Performance is drastically improved by increasing the arraysize attribute of the Cursor - allowing me to get decent performance out of fetchall() . pandas read_sql() takes a Connection object as input and the cursor is created within the function, therefore it's not obvious to me how I can apply that same setting and still take advantage of the read_sql() function. Have I missed something?

When using sqlalchemy to create an engine to connect to the database, you can pass the arraysize argument so it will be used when cursors are created:

import sqlalchemy
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:pass@host:port/dbname", arraysize=50)
pd.read_sql("query ...", engine)

See the docs here: http://docs.sqlalchemy.org/en/rel_1_0/dialects/oracle.html#module-sqlalchemy.dialects.oracle.cx_oracle for "Additional Connect Arguments"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM