简体   繁体   中英

How to download large data file by java code where data is fetching in chunks?

I need to have my bulk data into a file. When I tried to write the data to a file in on shot, sometimes I get OutOfMemoryException in my java code. To handle this case I am trying to write different code where I need to open file once and write the data to file in chunk so that my heap memory does not grow. So I am looking for the best approach for this case. My source data will be a rest service's response data . and I will write that data to the destination file.

Please suggest me a best approach to write data into a file...

I am trying to handle this case by following logic...

  • Open the output file as BufferOutputStream
  • Get the response from the rest get request
  • Convert that response into byte[]
  • Write the byte[] to file by buffOut.write(arr, 0, available);
  • Flush the fileBufferObject buffOut.flush();
  • And so on till we have data to write in file.

Java Streams looks very suitable option after taking your use case in consideration. Processing file is based on Java streams yield better results as compared to file scanner , Buffered Reader or Java NIO using memory mapped files .

Here is performance comparison of processing ability of various Java Alternatives:

File Size :- 1 GB

  1. Sanner approach: Total elapsed time: 15627 ms

  2. Maped Byte Buffer: Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

  3. Java 8 Stream: Total elapsed time: 3124 ms

  4. Java 7 Files: Total elapsed time: 13657 ms

Sample Processing Example is as below:

 package com.large.file;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;

public class Java8StreamRead {

    public static void main(String[] args) {

        long startTime = System.nanoTime();
        Path file = Paths.get("c:/temp/my-large-file.csv");
        try
        {
            //Java 8: Stream class
            Stream<String> lines = Files.lines( file, StandardCharsets.UTF_8 );

            for( String line : (Iterable<String>) lines::iterator )
            {
               //System.out.println(line);
            }

        } catch (IOException ioe){
            ioe.printStackTrace();
        }

        long endTime = System.nanoTime();
        long elapsedTimeInMillis = TimeUnit.MILLISECONDS.convert((endTime - startTime), TimeUnit.NANOSECONDS);
        System.out.println("Total elapsed time: " + elapsedTimeInMillis + " ms");
    }
}

Try the following:

URL url = new URL("http://large.file.dat");
Path path = Paths.get("/home/it/documents/large.file.dat");
Files.copy(url.openStream(), path);

Chunked should not matter, unless you want to work with parts of a file, when the connection is likely to fail after a time.

You can use compression sending headers and wrapping the InputStream in a GzippedInputStream. Or use apache's HttpClient with out-of-the-box support.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM