简体   繁体   English

将文本文件读取为字符串,而不会占用大量内存

[英]Reading text file to string without huge memory consumption

I've tried to measure performance of several approaches to read a file into string using NIO (slowest for reading single file), BufferedInputStream and reading the file line after line (600 ms average per pass) and then this stream using Filereader and an array with fixed size acting as a buffer (fastest) 我尝试测量几种方法的性能,这些方法使用NIO(读取单个文件最慢),BufferedInputStream并逐行读取文件(每遍平均600毫秒),然后使用Filereader和数组读取此流,将文件读取为字符串具有固定大小的缓冲区(最快)

File was 95 MB of pure text in windows .txt file format. 文件是Windows .txt文件格式的95 MB纯文本。 Converting chars to string really is the bottleneck, but what I noticed is HUGE memory consumption of this method. 将chars转换为字符串确实是瓶颈,但是我注意到的是此方法的巨大内存消耗。 For 95 MB of lorem ipsum, this consumes up to 1 GB of RAM. 对于95 MB的lorem ipsum,这将消耗多达1 GB的RAM。 I haven't found why. 我还没找到原因。

What I have tried with no effect: 我尝试过的没有效果:

Issuing Garbage Collector by calling System.gc() Setting all the pointer variables to null before method ends (but they should be anyway, they are defined only within method). 通过调用System.gc()发出垃圾回收器,以在方法结束之前将所有指针变量设置为null(但是无论如何它们都应该在方法内部定义)。

private void testCharStream() {
            File f = f = new File("c:/Downloads/test.txt");
    long oldTime = System.currentTimeMillis();
    char[] cbuf = new char[8192];
    StringBuilder builder = new StringBuilder();
    try {

        FileReader reader = new FileReader(f);

        while (reader.read(cbuf) != -1) {
            builder.append(cbuf);
        }

        reader.close();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    long currentTime = System.currentTimeMillis();

    System.out.println(currentTime - oldTime);
}

尝试Apache Commons IO: http : //commons.apache.org/proper/commons-io/我没有对其进行基准测试,但是我认为代码已经过优化。

I came up with decent solution. 我想出了一个不错的解决方案。 Using Apache Commons IO Package, memory peak was 777,1 MB , lowest 220 MB and 710 ms average needed for the 95 MB text file to be red. 使用Apache Commons IO软件包,内存峰值为777,1 MB ,最低的220 MB和710 ms的平均内存需要使95 MB的文本文件变为红色。

What I did was to set variable with pointer to StringBuilder object to null at the end of method and suggested garbage colletor to actually do it's work (System.gc()). 我所做的是在方法末尾将指向StringBuilder对象的指针的变量设置为null,并建议垃圾回收器实际完成此工作(System.gc())。 Memory peak is 540 MB , more than 1/2 of value previously achieved ! 内存峰值为540 MB ,是之前达到的值的1/2以上! Also by changing buffer size to 1024 means 40 ms improvement per pass, from 490 to 450 or even less. 同样,通过将缓冲区大小更改为1024,意味着每遍可从490改善到450,甚至更少,可改善40 ms。 So my function needs only 63.4 % of the Apache's time to read the file. 因此,我的函数仅需要Apache的63.4%的时间即可读取文件。 That's almost 40 % less. 几乎减少了40% Any ideas how to get the performance improved even more ? 有什么想法可以使性能进一步提高吗?

Here is the function. 这是功能。

private void testCharStream() {
    long oldTime = System.currentTimeMillis();
    char[] cbuf = new char[1024];
    StringBuilder builder = new StringBuilder();

    try {

        FileReader reader = new FileReader(f);

        while (reader.read(cbuf) != -1) {
            builder.append(cbuf);
        }

        reader.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    long currentTime = System.currentTimeMillis();
    builder = null;
    System.gc();
    System.out.println(currentTime - oldTime);
}

To get better performance you can use BufferedReader . 为了获得更好的性能,您可以使用BufferedReader This class allows you to read the file line by line. 此类允许您逐行读取文件。 Rather than wasting time by reading the file word by word this method will perform the task much faster. 与通过逐字读取文件来浪费时间相比,此方法将更快地执行任务。 You can read a file of plain text (Size: 1 MB) in half second. 您可以在半秒内读取纯文本文件(大小:1 MB)。 Just use the following code. 只需使用以下代码。

File f=new File("File path"); File f = new File(“ File path”);
FileReader fr=new FileReader(f) FileReader fr = new FileReader(f)
BufferedReader br=new BufferedReader(fr); BufferedReader br = new BufferedReader(fr);

String line=""; 字符串line =“”;
StringBuilder builder=new StringBuilder(); StringBuilder builder = new StringBuilder();
try { 尝试{
while((line=br.readLine())!=null) 而((行= br.readLine())!= NULL)
builder.append(line+"\\n"); builder.append(线+ “\\ n”);
} }
catch(Exception e) catch(异常e)
{ {
e.printStackTrace(); e.printStackTrace();
} }

You can check the time it takes to read the file as you've used the System.currentTimeMillis() . 使用System.currentTimeMillis()时,您可以检查读取文件所需的时间。

Take a look in the link below, reading Really big Files With Java(150GB). 请看下面的链接,阅读《使用Java的真正大文件》(150GB)。

[ http://www.answerques.com/s1imeegPeQqU/reading-really-big-files-with-java][1] [ http://www.answerques.com/s1imeegPeQqU/reading-really-big-files-with-java][1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM