简体   繁体   中英

Java - Huge Data Retrieval

I have a requiement where in one of the report, I need to fetch around 10 million records from the database and transfer them to Excel.

The application is client-server model where server side logic is written in EJB & client is written in Swing.

Now my question is when I try to fill the Object of Java from Resultset , IF the size of resultset is more ( > 100000) then It throws Out of Memory Error on Java side.

Can someone let me know how this scenerion should be handle in Java? I need to transfer all the records from the server to the client, and then I need to build the Excel report based on the data retrieved from server side.

I would break the resultset into smaller chunks by using the LIMIT command (mySQL, don't know if this is in other DB servers). Something like this pseudo-code:

long recsToget = 50000;
long got = recsToGet;
long offset = 0;
while ( got == recsToGet )
{
  got = getNextBatchFromDb( offset );
  writeBatchToCsv();
  offset += recsToGet; //increase your OFFSET each time
}

And I would use the LIMIT and OFFSET in the SQL query in the getNextBatchFromDb() function like this:

select * from yourtable LIMIT 50000 OFFSET 100000

where the OFFSET is the position to begin reading from and LIMIT is the number to read.

By doing this you can read your big dataset in smaller chunks and update the CSV each time until completed. You know all records have been read when getNextBatchFromDb() returns a number of rows smaller than recsToGet.

You could increase the amount of memory available to the JVM using the -Xmx switch (eg -Xmx1024m sets the JVM to have up to 1GB of memory).

If this is not an option or you've already done this, the only alternative is to rewrite the server to return results in stages rather than all at once. How you do this will depend on the specifics of the server implementation.

You will need to do as much work with the data on the database side as you can. Then once you have the data try to write out the data as you are reading it from the database or at least in some sort of buffer so that you aren't loading up all the data in the Java program.

Instead of using an Object, you may be able to use a primitive type. Note: unless your client as more memory than your server, there is no point sending all this data to the client.

Normally, the server generates reports for the client. This maximises the work done by the server and minises the data sent to the client. Excel cannot handle more than one million rows in a sheet and its charts cannot handle more then 32,000 points. I suggest you do the report on the server.

Object is not the good choice for this scenario. Following are few of the option to handle this.

1) Apply pagination while retrieving the records from database and append to your report.

2) This option depends on Database server. Some of the DB servers have the capability to export the output of any query to a flat file. Check if ur DB supports this. Then after exporting you can read the contents from the flat file and generate you report.

Many friends have already mentioned the restriction of excel so u got to take care of that as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM