简体   繁体   English

Python cx_Oracle 查询速度

[英]Python cx_Oracle query speed

Problem问题

It takes a long time to query our Oracle database with python using cx_Oracle.使用cx_Oracle 用python 查询我们的Oracle 数据库需要很长时间。

Method方法

I connect to oracle db using cx_Oracle, and query using a standard SQL line like:我使用 cx_Oracle 连接到 oracle db,并使用标准 SQL 行进行查询,例如:

select * from t1;

Results are returned as a pandas dataframe.结果作为熊猫数据框返回。

Observations观察

  • Querying from Spyder and Jupyter Notebooks are equally slow.从 Spyder 和 Jupyter Notebooks 查询同样慢。
  • Querying from an SQL client like DBeaver returns results in ~1/10th the time.从 SQL 客户端(如 DBeaver)查询返回结果的时间约为 1/10。

Caveat警告

I have not tested if this holds true when I include row-limits in the cx-Oracle queries.当我在 cx-Oracle 查询中包含行限制时,我还没有测试这是否成立。

@Christopher Jones and @Anthony Tuininga pointed to next level answers in solving why queries were returning so slowly. @Christopher Jones 和 @Anthony Tuininga 指出了解决查询返回如此缓慢的下一个级别的答案。 In my #Caveat section, I noted that I had no在我的#Caveat 部分,我注意到我没有

WHERE rownum < 10

in my queries.在我的查询中。 Including this did indeed reduce query times.包括这一点确实减少了查询时间。 My bad.我的错。

Also, a command like另外,像这样的命令

create table t1 as
select *
from table1

results in instant feedback, since nothing is being received from the db.导致即时反馈,因为没有从数据库收到任何信息。 It was just a matter of issuing a query that demanded a lot of data back.这只是发出一个要求返回大量数据的查询的问题。 An SQL client has those row limits built in to prevent people like me from asking these questions! SQL 客户端内置了这些行限制,以防止像我这样的人提出这些问题! :) :)

Addressing Slow Write Speeds解决写入速度慢的问题

When writing to Oracle, all df columns which are String objects are by default converted to CLOB (Character large objects), which are max 2,147,483,647 in length).写入 Oracle 时,所有属于 String 对象的 df 列默认都转换为CLOB (字符大对象),最大长度为 2,147,483,647)。 This means that tables are very likely far larger than necassary.这意味着表格很可能比必要的要大得多。

When using df.to_sql, you typically use something like the following:使用 df.to_sql 时,您通常使用以下内容:

df.to_sql("tablename"
        connection_string,
        schema='myschema',
        if_exists='replace',
        index=False
        )

However, omitting the optional parameter dtypes may result in CLOBs.但是,省略可选参数 dtypes 可能会导致 CLOB。

A Solution一个办法

To overcome this, you either manually provide dtypes:为了克服这个问题,您可以手动提供 dtypes:

df.to_sql("tablename"
        connection_string,
        schema='myschema',
        if_exists='replace',
        index=False,
        dtype={"RUN_DT": sa.types.DateTime(),
        "fname": sa.types.VARCHAR(length=50),
        "lname": sa.types.VARCHAR(length=250),
        "FROM_DT": sa.types.DateTime(),
        "TO_DT": sa.types.DateTime(),
        "UUID": sa.types.VARCHAR(length=50)}
        )

A Better, Generalized Solution更好的通用解决方案

But what if you have many tables and many columns and you need a generalized solution?但是,如果您有很多表和很多列,并且需要一个通用的解决方案怎么办?

@Parfait provided the perfect solution to this issue on SO here . @Parfait 在此处为 SO 上的此问题提供了完美的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM