使用cx_oracle批量下载表

Question

I need to download a large table from an oracle database into a python server, using cx_oracle to do so.我需要使用 cx_oracle 将一个大表从 oracle 数据库下载到 python 服务器中。 However, the ram is limited on the python server and so I need to do it in a batch way.但是，ram 在 python 服务器上受到限制，因此我需要以批处理方式进行。

I know already how to do generally a whole table我已经知道通常如何做整张桌子

usr = ''
pwd = ''
tns = '(Description = ...'
orcl = cx_Oracle.connect(user, pwd, tns)
curs = orcl.cursor()
printHeader=True
tabletoget = 'BIGTABLE'
sql = "SELECT * FROM " + "SCHEMA." + tabletoget
curs.execute(sql)
data = pd.read_sql(sql, orcl)
data.to_csv(tabletoget + '.csv'

I'm not sure what to do though to load say a batch of 10000 rows at a time and then save it off to a csv and then rejoin.我不确定该怎么做，但要一次加载一批 10000 行，然后将其保存到 csv 中，然后重新加入。

Answer 1

You can use cx_Oracle directly to perform this sort of batch:您可以直接使用 cx_Oracle 来执行此类批处理：

curs.arraysize = 10000
curs.execute(sql)
while True:
    rows = cursor.fetchmany()
    if rows:
        write_to_csv(rows)
    if len(rows) < curs.arraysize:
        break

If you are using Oracle Database 12c or higher you can also use the OFFSET and FETCH NEXT ROWS options, like this:如果您使用的是 Oracle Database 12c 或更高版本，您还可以使用 OFFSET 和 FETCH NEXT ROWS 选项，如下所示：

offset = 0
numRowsInBatch = 10000
while True:
    curs.execute("select * from tabletoget offset :offset fetch next :nrows only",
            offset=offset, nrows=numRowsInBatch)
    rows = curs.fetchall()
    if rows:
        write_to_csv(rows)
    if len(rows) < numRowsInBatch:
        break
    offset += len(rows)

This option isn't as efficient as the first one and involves giving the database more work to do but it may be better for you depending on your circumstances.此选项不如第一个有效，并且涉及为数据库提供更多工作要做，但根据您的情况，它可能对您更好。

None of these examples use pandas directly.这些例子都没有直接使用熊猫。 I am not particularly familiar with that package, but if you (or someone else) can adapt this appropriately, hopefully this will help!我对那个包不是特别熟悉，但是如果您（或其他人）可以适当地调整它，希望这会有所帮助！

Answer 2

You can achieve your result like this.你可以像这样实现你的结果。 Here I am loading data to df.在这里，我正在将数据加载到 df。

import cx_Oracle
import time
import pandas

user = "test"
pw = "test"
dsn="localhost:port/TEST"

con = cx_Oracle.connect(user,pw,dsn)
start = time.time()
cur = con.cursor()
cur.arraysize = 10000
try:
    cur.execute( "select * from test_table" )
    names = [ x[0] for x in cur.description]
    rows = cur.fetchall()
    df=pandas.DataFrame( rows, columns=names)
    print(df.shape)
    print(df.head())
finally:
    if cur is not None:
        cur.close()

elapsed = (time.time() - start)
print(elapsed, "seconds")

使用cx_oracle批量下载表

问题描述

2 个解决方案

解决方案1
1 2019-08-02 17:21:38

解决方案2
0 2020-08-06 19:47:08

使用cx_oracle批量下载表

问题描述

2 个解决方案

解决方案1 1 2019-08-02 17:21:38

解决方案2 0 2020-08-06 19:47:08

解决方案1
1 2019-08-02 17:21:38

解决方案2
0 2020-08-06 19:47:08