简体   繁体   中英

Fetching huge data from Oracle in Python

I need to fetch huge data from Oracle (using cx_oracle) in python 2.6, and to produce some csv file.

The data size is about 400k record x 200 columns x 100 chars each.

Which is the best way to do that?

Now, using the following code...

ctemp = connection.cursor()
ctemp.execute(sql)
ctemp.arraysize = 256
for row in ctemp:
  file.write(row[1])
  ...

... the script remain hours in the loop and nothing is writed to the file... (is there a way to print a message for every record extracted?)

Note: I don't have any issue with Oracle, and running the query in SqlDeveloper is super fast.

Thank you, gian

You should use cur.fetchmany() instead. It will fetch chunk of rows defined by arraysise (256)

Python code:

def chunks(cur): # 256
    global log, d
    while True:
        #log.info('Chunk size %s' %  cur.arraysize, extra=d)
        rows=cur.fetchmany()

        if not rows: break;
        yield rows

Then do your processing in a for loop;

for i, chunk  in enumerate(chunks(cur)):
            for row in chunk:
                     #Process you rows here

That is exactly how I do it in my TableHunter for Oracle .

  • add print statements after each line
  • add a counter to your loop indicating progress after each N rows
  • look into a module like 'progressbar' for displaying a progress indicator

I think your code is asking the database for the data one row at the time which might explain the slowness.

Try:

ctemp = connection.cursor()
ctemp.execute(sql)
Results = ctemp.fetchall()
for row in Results:
    file.write(row[1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM