简体   繁体   English

使用oracle occi和c ++将1300万行转储到一个文件中

[英]dump 13 million rows to a file using oracle occi and c++

I am trying to export data from a table in database to a file in csv format. 我试图将数据从数据库中的表导出到csv格式的文件。 I came up with below program . 我想出了以下程序。 My table contains about 13 million rows and this program is very slow. 我的表包含大约13 million行,这个程序非常慢。

How can I speed up this program ? 我怎样才能加速这个程序?

#include <iostream>
#include <occi.h>
#include <stdlib.h>
#include <fstream>
using namespace std;

int main()
{

    oracle::occi::Environment* environment;
    oracle::occi::Connection *con;
    oracle::occi::Statement* stmt;
    oracle::occi::ResultSet* res;

    try
    {

        ofstream outfile;
        outfile.open("example.txt");
        string user ; cin>>user;
        string pass ; cin>>pass;
        string instance ; cin >>instance;
        environment = oracle::occi::Environment::createEnvironment(oracle::occi::Environment::DEFAULT);
        con = environment->createConnection(user,pass,instance);
        string query = "SELECT A,B FROM TABLE_X";

        stmt = con->createStatement(query);
        res = stmt->executeQuery();

        while (res->next())
        {
                outfile<<res->getInt(1)<<','<<res->getInt(2)<<'\n';
        }

        outfile.close();
        stmt->closeResultSet(res);
        con->terminateStatement(stmt);
        environment->terminateConnection(con);

    }catch(oracle::occi::SQLException &e){
        std::cout<<e.what();
    }

 return 0;
}

Use array fetch to reduce database round tripping. 使用数组提取来减少数据库往返。 The following is from here . 以下是这里的内容 I would experiment with values 20,50,100,1000 to find optimal value for "NumROws" in the example below. 我将尝试使用值20,50,100,1000来查找下面示例中“NumROws”的最佳值。

Example 11-1 How to use Array Fetch with a ResultSet 示例11-1如何对ResultSet使用Array Fetch

ResultSet *resultSet = stmt->executeQuery(...);
resultSet->setDataBuffer(...);
while (resultSet->next(numRows) == DATA_AVAILABLE)
   process(resultSet->getNumArrayRows() );

This causes up to numRows amount of data to be fetched for each column. 这会导致每列最多获取numRows数据量。 The buffers specified with the setDataBuffer() interface should large enough to hold at least numRows of data. 使用setDataBuffer()接口指定的缓冲区应足够大,至少可容纳数量的numRows。

Another strategy is to look split the task by ranges and have these running in parallel. 另一种策略是按范围分割任务,并使它们并行运行。 If the export data must be in a single file then you can merge them separately (cat file1 file2 > file ). 如果导出数据必须在单个文件中,则可以单独合并它们(cat file1 file2> file)。

What about the file system you are writing to? 你写的文件系统怎么样? Is it slow? 它慢吗? Have you tried writing to a different location? 你有没有尝试过写一个不同的地方? at the file system that the output is being written to. 在正在写入输出的文件系统中。

I don't know what you call slow. 我不知道你叫什么慢。 But independently of the database fetchning, you could improve significantly the writing performance of file i/o by using write() instead of operator<< . 但是,与数据库读取无关,您可以通过使用write()而不是operator<<来显着提高文件i / o的写入性能。

A little benchmark writing 1 million random csv pairs like yours showed following performance on my poor win8 pc : 像你这样的100万随机csv对的基准测试显示我的可怜的win8 pc的性能如下:

operator<<  outputs at a rate of 7 Mb/s
write()     outputs at a rate of 40 Mb/s

That's more than 5 times faster, ie around 30 seconds for 13 millions entries. 这速度提高了5倍以上,即1300万条目的约30秒。

The code in the looks uggly however, so upt to you to see if it's worth the effort: 然而,看起来很宽松的代码,你可以看看它是否值得付出努力:

    os << x << ',' << y << '\n'; 

becomes

    p=itoa(x, buff, 10); 
    while(*p)
        p++; 
    *p++ = ',';
    itoa(y, p, 10); 
    while(*p)
        p++;
    *p++ = '\n'; 
    *p++ = '\0';
    os.write(buff, p - buff); 

where buff is a buffer allocated outside the loop. 其中buff是在循环外部分配的缓冲区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM