简体   繁体   English

如何通过Java代码下载大数据文件,在Java代码中数据是分块获取的?

[英]How to download large data file by java code where data is fetching in chunks?

I need to have my bulk data into a file. 我需要将批量数据保存到文件中。 When I tried to write the data to a file in on shot, sometimes I get OutOfMemoryException in my java code. 当我尝试将数据即时写入文件时,有时我的Java代码中出现OutOfMemoryException To handle this case I am trying to write different code where I need to open file once and write the data to file in chunk so that my heap memory does not grow. 为了处理这种情况,我尝试编写不同的代码,在该代码中我需要一次打开文件,然后write the data to file in chunk以免堆内存增加。 So I am looking for the best approach for this case. 因此,我正在looking for the best approach这种情况looking for the best approach My source data will be a rest service's response data . 我的源数据将是rest service's response data and I will write that data to the destination file. 然后将数据写入目标文件。

Please suggest me a best approach to write data into a file... 请建议我一种将数据写入文件的最佳方法。

I am trying to handle this case by following logic... 我正在尝试通过遵循逻辑来处理这种情况...

  • Open the output file as BufferOutputStream 打开输出文件为BufferOutputStream
  • Get the response from the rest get request 从其余的获取请求中获取响应
  • Convert that response into byte[] 将该响应转换为byte []
  • Write the byte[] to file by buffOut.write(arr, 0, available); 通过buffOut.write(arr, 0, available);将byte []写入文件buffOut.write(arr, 0, available);
  • Flush the fileBufferObject buffOut.flush(); 刷新fileBufferObject buffOut.flush();
  • And so on till we have data to write in file. 依此类推,直到我们有数据要写入文件为止。

Java Streams looks very suitable option after taking your use case in consideration. 考虑到您的用例之后, Java Streams看起来非常合适。 Processing file is based on Java streams yield better results as compared to file scanner , Buffered Reader or Java NIO using memory mapped files . 使用内存映射文件的文件扫描器,缓冲读取器或Java NIO相比,基于Java流处理文件的结果更好。

Here is performance comparison of processing ability of various Java Alternatives: 这是各种Java替代品的处理能力的性能比较:

File Size :- 1 GB 档案大小: -1 GB

  1. Sanner approach: Total elapsed time: 15627 ms Sanner方法:总耗用时间: 15627毫秒

  2. Maped Byte Buffer: Exception in thread “main” java.lang.OutOfMemoryError: Java heap space 映射的字节缓冲区:线程“ main”中的异常java.lang.OutOfMemoryError:Java堆空间

  3. Java 8 Stream: Total elapsed time: 3124 ms Java 8流:总经过时间: 3124毫秒

  4. Java 7 Files: Total elapsed time: 13657 ms Java 7文件:总经过时间: 13657毫秒

Sample Processing Example is as below: 样品处理示例如下:

 package com.large.file;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.stream.Stream;

public class Java8StreamRead {

    public static void main(String[] args) {

        long startTime = System.nanoTime();
        Path file = Paths.get("c:/temp/my-large-file.csv");
        try
        {
            //Java 8: Stream class
            Stream<String> lines = Files.lines( file, StandardCharsets.UTF_8 );

            for( String line : (Iterable<String>) lines::iterator )
            {
               //System.out.println(line);
            }

        } catch (IOException ioe){
            ioe.printStackTrace();
        }

        long endTime = System.nanoTime();
        long elapsedTimeInMillis = TimeUnit.MILLISECONDS.convert((endTime - startTime), TimeUnit.NANOSECONDS);
        System.out.println("Total elapsed time: " + elapsedTimeInMillis + " ms");
    }
}

Try the following: 请尝试以下操作:

URL url = new URL("http://large.file.dat");
Path path = Paths.get("/home/it/documents/large.file.dat");
Files.copy(url.openStream(), path);

Chunked should not matter, unless you want to work with parts of a file, when the connection is likely to fail after a time. 分块不应该的问题,除非你想用一个文件的部分工作,当连接很可能在一段时间后失效。

You can use compression sending headers and wrapping the InputStream in a GzippedInputStream. 您可以使用压缩发送头并将InputStream封装在GzippedInputStream中。 Or use apache's HttpClient with out-of-the-box support. 或者使用现成的支持使用apache的HttpClient

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM