简体   繁体   English

如何在巨大的文件写入期间克服OutOfMemoryError

[英]How to overcome OutOfMemoryError during huge file write

I am writing a full database extract program in java. 我正在用java编写完整的数据库提取程序。 Database is Oracle, and it is huge. 数据库是Oracle,它是巨大的。 Some tables have ~260 million records. 有些表有大约2.6亿条记录。 The program should create one file per table in a specific format, so using Oracle datapump etc is not an option. 程序应该以特定格式为每个表创建一个文件,因此使用Oracle数据泵等不是一个选项。 Also, some company security policies do not allow to write a PL/SQL procedure to create files on DB server for this requirement. 此外,某些公司安全策略不允许编写PL / SQL过程以在此服务器上为此要求创建文件。 I have to go with Java and JDBC. 我必须使用Java和JDBC。

The issue I am facing is that Since files for some of the table is huge (~30GB) I am running out of memory almost every time even with a 20GB Java Heap. 我面临的问题是,由于某些表的文件很大(~30GB),即使使用20GB的Java堆,我几乎每次都会耗尽内存。 During the creation of file when the file size exceeds the heap size, even with one of the most aggressive GC policy, the process seems to hang-up. 在文件大小超过堆大小的文件创建期间,即使使用最激进的GC策略之一,该过程似乎也会挂起。 For example if the file size is > 20GB and heap size is 20GB, once heap utilization hits max heap size, its slows down writing 2MB per minute or so and at this speed, it will take months to get full extract. 例如,如果文件大小> 20GB且堆大小为20GB,一旦堆利用率达到最大堆大小,它每分钟写入速度减慢2MB左右,并且以此速度,需要几个月才能获得完全提取。

I am looking for some way to overcome this issue. 我正在寻找一些方法来克服这个问题。 Any help would be greatly appreciated. 任何帮助将不胜感激。

Here are some details of the system configuration I have: Java - JDK1.6.0_14 以下是我所拥有的系统配置的一些细节:Java - JDK1.6.0_14

System config - RH Enterprise Linux (2.6.18) running on 4 X Intel Xeon E7450 (6 cores) @2.39GH 系统配置 - 在4 X Intel Xeon E7450(6核)@ 2.39GH上运行的RH Enterprise Linux(2.6.18)

RAM - 32GB 内存 - 32GB

Database Oracle 11g 数据库Oracle 11g

file wirting part of the code goes below: 文件的部分代码如下:

private void runQuery(Connection conn, String query, String filePath,
        String fileName) throws SQLException, Exception {
    PreparedStatement stmt = null;
    ResultSet rs = null;
    try {
        stmt = conn.prepareStatement(query,
                ResultSet.TYPE_SCROLL_INSENSITIVE,
                ResultSet.CONCUR_READ_ONLY);
        stmt.setFetchSize(maxRecBeforWrite);
        rs = stmt.executeQuery();
        // Write query result to file
        writeDataToFile(rs, filePath + "/" + fileName, getRecordCount(
                query, conn));
    } catch (SQLException sqle) {
        sqle.printStackTrace();
    } finally {
        try {
            rs.close();
            stmt.close();
        } catch (SQLException ex) {
            throw ex;
        }
    }
}

private void writeDataToFile(ResultSet rs, String tempFile, String cnt)
        throws SQLException, Exception {
    FileOutputStream fileOut = null;
    int maxLength = 0;
    try {
        fileOut = new FileOutputStream(tempFile, true);
        FileChannel fcOut = fileOut.getChannel();

        List<TableMetaData> metaList = getMetaData(rs);
        maxLength = getMaxRecordLength(metaList);
        // Write Header
        writeHeaderRec(fileOut, maxLength);
        while (rs.next()) {
            // Now iterate on metaList and fetch all the column values.
            writeData(rs, metaList, fcOut);
        }
        // Write trailer
        writeTrailerRec(fileOut, cnt, maxLength);
    } catch (FileNotFoundException fnfe) {
        fnfe.printStackTrace();
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } finally {
        try {
            fileOut.close();
        } catch (IOException ioe) {
            fileOut = null;
            throw new Exception(ioe.getMessage());
        }
    }
}

private void writeData(ResultSet rs, List<TableMetaData> metaList,
        FileChannel fcOut) throws SQLException, IOException {
    StringBuilder rec = new StringBuilder();
    String lf = "\n";
    for (TableMetaData tabMeta : metaList) {
        rec.append(getFormattedString(rs, tabMeta));
    }
    rec.append(lf);
    ByteBuffer byteBuf = ByteBuffer.wrap(rec.toString()
            .getBytes("US-ASCII"));
    fcOut.write(byteBuf);
}

private String getFormattedString(ResultSet rs, TableMetaData tabMeta)
        throws SQLException, IOException {
    String colValue = null;
    // check if it is a CLOB column
    if (tabMeta.isCLOB()) {
        // Column is a CLOB, so fetch it and retrieve first clobLimit chars.
        colValue = String.format("%-" + tabMeta.getColumnSize() + "s",
                getCLOBString(rs, tabMeta));
    } else {
        colValue = String.format("%-" + tabMeta.getColumnSize() + "s", rs
                .getString(tabMeta.getColumnName()));
    }
    return colValue;

} }

Its probably due to the way you call prepareStatement , see this question for a similar problem. 它可能是由于你调用prepareStatement的方式,看到这个问题的类似问题。 You don't need scrollability and a ResultSet will be read-only be default so just call 您不需要可滚动性, ResultSet将是只读的默认值,因此只需调用即可

stmt = conn.prepareStatement(query);

Edit : Map your database tables to Class usig JPA. 编辑 :将数据库表映射到类usig JPA。
Now load collection of Objects from DB using Hibernate in the Batch of some tolerable size and serialize it to FILE . 现在使用Hibernate以一些可容忍的大小批量加载来自DB的对象集合,并将其序列化为FILE。

Is your algorithm like the following? 您的算法是否如下所示? This is assuming a direct mapping between DB rows and lines in the file: 这假设DB行和文件中的行之间有直接映射:

// open file for writing with buffered writer.
// execute JDBC statement
// iterate through result set
    // convert rs to file format
    // write to file
// close file
// close statement/rs/connection etc

Try using Spring JDBC Template to simplify the JDBC portion. 尝试使用Spring JDBC Template来简化JDBC部分。

I believe this must be possible on default 32 MB java heap. 我相信这在默认的32 MB java堆上必须是可能的。 Just fetch each row, save the data to file stream, flash and close once done. 只需获取每一行,将数据保存到文件流,闪存并​​关闭一次。

What value are you using for maxRecBeforWrite? 你对maxRecBeforWrite有什么价值?

Perhaps the query of the max record length is defeating your setFetchSize by forcing JDBC to scan the entire result for record length? 也许最大记录长度的查询是通过强制JDBC扫描整个结果的记录长度来破坏你的setFetchSize? Maybe you could delay writing your header and note the max record size on the fly. 也许你可以延迟编写标题并注意最大记录大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM