简体   繁体   中英

dump 13 million rows to a file using oracle occi and c++

I am trying to export data from a table in database to a file in csv format. I came up with below program . My table contains about 13 million rows and this program is very slow.

How can I speed up this program ?

#include <iostream>
#include <occi.h>
#include <stdlib.h>
#include <fstream>
using namespace std;

int main()
{

    oracle::occi::Environment* environment;
    oracle::occi::Connection *con;
    oracle::occi::Statement* stmt;
    oracle::occi::ResultSet* res;

    try
    {

        ofstream outfile;
        outfile.open("example.txt");
        string user ; cin>>user;
        string pass ; cin>>pass;
        string instance ; cin >>instance;
        environment = oracle::occi::Environment::createEnvironment(oracle::occi::Environment::DEFAULT);
        con = environment->createConnection(user,pass,instance);
        string query = "SELECT A,B FROM TABLE_X";

        stmt = con->createStatement(query);
        res = stmt->executeQuery();

        while (res->next())
        {
                outfile<<res->getInt(1)<<','<<res->getInt(2)<<'\n';
        }

        outfile.close();
        stmt->closeResultSet(res);
        con->terminateStatement(stmt);
        environment->terminateConnection(con);

    }catch(oracle::occi::SQLException &e){
        std::cout<<e.what();
    }

 return 0;
}

Use array fetch to reduce database round tripping. The following is from here . I would experiment with values 20,50,100,1000 to find optimal value for "NumROws" in the example below.

Example 11-1 How to use Array Fetch with a ResultSet

ResultSet *resultSet = stmt->executeQuery(...);
resultSet->setDataBuffer(...);
while (resultSet->next(numRows) == DATA_AVAILABLE)
   process(resultSet->getNumArrayRows() );

This causes up to numRows amount of data to be fetched for each column. The buffers specified with the setDataBuffer() interface should large enough to hold at least numRows of data.

Another strategy is to look split the task by ranges and have these running in parallel. If the export data must be in a single file then you can merge them separately (cat file1 file2 > file ).

What about the file system you are writing to? Is it slow? Have you tried writing to a different location? at the file system that the output is being written to.

I don't know what you call slow. But independently of the database fetchning, you could improve significantly the writing performance of file i/o by using write() instead of operator<< .

A little benchmark writing 1 million random csv pairs like yours showed following performance on my poor win8 pc :

operator<<  outputs at a rate of 7 Mb/s
write()     outputs at a rate of 40 Mb/s

That's more than 5 times faster, ie around 30 seconds for 13 millions entries.

The code in the looks uggly however, so upt to you to see if it's worth the effort:

    os << x << ',' << y << '\n'; 

becomes

    p=itoa(x, buff, 10); 
    while(*p)
        p++; 
    *p++ = ',';
    itoa(y, p, 10); 
    while(*p)
        p++;
    *p++ = '\n'; 
    *p++ = '\0';
    os.write(buff, p - buff); 

where buff is a buffer allocated outside the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM