[英]Python cx_Oracle query speed
It takes a long time to query our Oracle database with python using cx_Oracle.使用cx_Oracle 用python 查询我们的Oracle 数据库需要很长时间。
I connect to oracle db using cx_Oracle, and query using a standard SQL line like:我使用 cx_Oracle 连接到 oracle db,并使用标准 SQL 行进行查询,例如:
select * from t1;
Results are returned as a pandas dataframe.结果作为熊猫数据框返回。
I have not tested if this holds true when I include row-limits in the cx-Oracle queries.当我在 cx-Oracle 查询中包含行限制时,我还没有测试这是否成立。
Tune the application architecture调整应用程序架构
Tune the network . 调整网络。 Particularly the SDU size and socket buffer sizes.
特别是 SDU 大小和套接字缓冲区大小。 (If it helps, there is an alternative description of these in the Oracle Net Easy Connect Plus Whitepaper ).
(如果有帮助, Oracle Net Easy Connect Plus 白皮书中有对这些的替代描述)。
Tune cx_Oracle , particularly Cursor.arraysize
调整 cx_Oracle ,特别是
Cursor.arraysize
@Christopher Jones and @Anthony Tuininga pointed to next level answers in solving why queries were returning so slowly. @Christopher Jones 和 @Anthony Tuininga 指出了解决查询返回如此缓慢的下一个级别的答案。 In my #Caveat section, I noted that I had no
在我的#Caveat 部分,我注意到我没有
WHERE rownum < 10
in my queries.在我的查询中。 Including this did indeed reduce query times.
包括这一点确实减少了查询时间。 My bad.
我的错。
Also, a command like另外,像这样的命令
create table t1 as
select *
from table1
results in instant feedback, since nothing is being received from the db.导致即时反馈,因为没有从数据库收到任何信息。 It was just a matter of issuing a query that demanded a lot of data back.
这只是发出一个要求返回大量数据的查询的问题。 An SQL client has those row limits built in to prevent people like me from asking these questions!
SQL 客户端内置了这些行限制,以防止像我这样的人提出这些问题! :)
:)
When writing to Oracle, all df columns which are String objects are by default converted to CLOB (Character large objects), which are max 2,147,483,647 in length).写入 Oracle 时,所有属于 String 对象的 df 列默认都转换为CLOB (字符大对象),最大长度为 2,147,483,647)。 This means that tables are very likely far larger than necassary.
这意味着表格很可能比必要的要大得多。
When using df.to_sql, you typically use something like the following:使用 df.to_sql 时,您通常使用以下内容:
df.to_sql("tablename"
connection_string,
schema='myschema',
if_exists='replace',
index=False
)
However, omitting the optional parameter dtypes may result in CLOBs.但是,省略可选参数 dtypes 可能会导致 CLOB。
To overcome this, you either manually provide dtypes:为了克服这个问题,您可以手动提供 dtypes:
df.to_sql("tablename"
connection_string,
schema='myschema',
if_exists='replace',
index=False,
dtype={"RUN_DT": sa.types.DateTime(),
"fname": sa.types.VARCHAR(length=50),
"lname": sa.types.VARCHAR(length=250),
"FROM_DT": sa.types.DateTime(),
"TO_DT": sa.types.DateTime(),
"UUID": sa.types.VARCHAR(length=50)}
)
But what if you have many tables and many columns and you need a generalized solution?但是,如果您有很多表和很多列,并且需要一个通用的解决方案怎么办?
@Parfait provided the perfect solution to this issue on SO here . @Parfait 在此处为 SO 上的此问题提供了完美的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.