简体   繁体   English

Java类CachedRowSetImpl的奇怪行为

[英]Strange behavior for java class CachedRowSetImpl

I have so difficulty to use the CachedRowSetImpl class in java. 我很难在java中使用CachedRowSetImpl类。 I want to analyse the data of a huge postgres table, that contains ~35,000,000 lines and 3 integer columns. 我想分析一个巨大的postgres表的数据,它包含~35,000,000行和3个整数列。

I cannot load everything into my computer physical memory, then I want to read these lines per batch of 100000 lines. 我无法将所有内容加载到我的计算机物理内存中,然后我想每批100000行读取这些行。 When executing the corresponding query (select col1,col2,col3 from theTable limit 10000) in psql prompt or in a graphical interface such as pgadmin, it takes around 4000ms to load the 100000 lines and a few megabytes of memory. 在psql提示符或pgadmin等图形界面中执行相应的查询(从表限制10000中选择col1,col2,col3)时,需要大约4000ms来加载100000行和几兆字节的内存。

I try to do the same operation with the following java code: 我尝试使用以下java代码执行相同的操作:

CachedRowSet rowset = new CachedRowSetImpl();
int pageSize=1000000;
rowset.setCommand("select pk_lib_scaf_a,pk_lib_scaf_b,similarity_evalue from from_to_scaf");
rowset.setPageSize(pageSize);
rowset.setReadOnly(true);
rowset.setFetchSize(pageSize);
rowset.setFetchDirection(ResultSet.FETCH_FORWARD);
rowset.execute(myConnection);

System.out.println("start !");

while (rowset.nextPage()) {
    while (rowset.next()) {
        //treatment of current data page

    } // End of inner while
    rowset.release();
} 

When running the above code, the "start !" 运行上面的代码时,“开始!” message is never displayed in the console and the execution seems to be stuck in the rowset.execute() line. 消息永远不会显示在控制台中,并且执行似乎卡在rowset.execute()行中。 Moreover, the memory consumption gets crazy and reach the limit of my computer physical memory (8gb). 而且,内存消耗变得疯狂并达到我的计算机物理内存(8gb)的极限。

That's strange, it looks like the program tries to fill the rowset with the ~35,000,000 lines, without considering the pageSize configuration. 这很奇怪,看起来程序试图用~35,000,000行填充行集,而不考虑pageSize配置。

Does anybody experienced such problem with java JDBC and postgres drivers ? 有没有人遇到过java JDBC和postgres驱动程序这样的问题? What do I miss ? 我错过了什么?

postgres 9.1 java jdk 1.7 postgres 9.1 java jdk 1.7

From the CachedRowSet Javadoc (emphasis mine): CachedRowSet Javadoc(强调我的):

A CachedRowSet object is a disconnected rowset, which means that it makes use of a connection to its data source only briefly. CachedRowSet对象是断开连接的行集,这意味着它只是短暂地使用与其数据源的连接。 It connects to its data source while it is reading data to populate itself with rows and again while it is propagating changes back to its underlying data source . 它在读取数据时连接到其数据源,以便用行填充自身,并在将更改传播回其底层数据源时再次连接 The rest of the time, a CachedRowSet object is disconnected, including while its data is being modified. 剩下的时间, CachedRowSet对象被断开连接,包括在修改其数据时。

From your question: 从你的问题:

it looks like the program tries to fill the rowset with the ~35,000,000 lines, without considering the pageSize configuration 看起来程序试图用~35,000,000行填充行集,而不考虑pageSize配置

Yes, CachedRowSet will retrieve the 35m rows from your database at once, and after that it will apply the pagination and other defined properties. 是的, CachedRowSet将立即从您的数据库中检索35m行,之后它将应用分页和其他已定义的属性。 A possible solution would be processing the data by small chunks and having a flag on each row to mark it as processed. 一种可能的解决方案是通过小块处理数据并在每一行上标记以将其标记为已处理。

I would recommend using an ETL tool like Pentaho that already handles this kind of problems. 我建议使用像Pentaho这样已经处理过这类问题的ETL工具。

In fact, the support of cursor is implicitly coded in the postgres JDBC as described in its documentation. 事实上,游标的支持是在postgres JDBC中隐式编码的,如文档中所述。 A cursor is however created automatically with some conditions. 但是,在某些条件下会自动创建光标。

http://jdbc.postgresql.org/documentation/head/query.html http://jdbc.postgresql.org/documentation/head/query.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM