将文本文件读取为字符串，而不会占用大量内存

Question

我尝试测量几种方法的性能，这些方法使用NIO（读取单个文件最慢），BufferedInputStream并逐行读取文件（每遍平均600毫秒），然后使用Filereader和数组读取此流，将文件读取为字符串具有固定大小的缓冲区（最快）

文件是Windows .txt文件格式的95 MB纯文本。 将chars转换为字符串确实是瓶颈，但是我注意到的是此方法的巨大内存消耗。 对于95 MB的lorem ipsum，这将消耗多达1 GB的RAM。 我还没找到原因。

我尝试过的没有效果：

通过调用System.gc（）发出垃圾回收器，以在方法结束之前将所有指针变量设置为null（但是无论如何它们都应该在方法内部定义）。

private void testCharStream() {
            File f = f = new File("c:/Downloads/test.txt");
    long oldTime = System.currentTimeMillis();
    char[] cbuf = new char[8192];
    StringBuilder builder = new StringBuilder();
    try {

        FileReader reader = new FileReader(f);

        while (reader.read(cbuf) != -1) {
            builder.append(cbuf);
        }

        reader.close();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    long currentTime = System.currentTimeMillis();

    System.out.println(currentTime - oldTime);
}

Answer 1

尝试Apache Commons IO： http : //commons.apache.org/proper/commons-io/我没有对其进行基准测试，但是我认为代码已经过优化。

Answer 2

我想出了一个不错的解决方案。 使用Apache Commons IO软件包，内存峰值为777,1 MB ，最低的220 MB和710 ms的平均内存需要使95 MB的文本文件变为红色。

我所做的是在方法末尾将指向StringBuilder对象的指针的变量设置为null，并建议垃圾回收器实际完成此工作（System.gc（））。 内存峰值为540 MB ，是之前达到的值的1/2以上！ 同样，通过将缓冲区大小更改为1024，意味着每遍可从490改善到450，甚至更少，可改善40 ms。 因此，我的函数仅需要Apache的63.4％的时间即可读取文件。 几乎减少了40％ 。 有什么想法可以使性能进一步提高吗？

这是功能。

private void testCharStream() {
    long oldTime = System.currentTimeMillis();
    char[] cbuf = new char[1024];
    StringBuilder builder = new StringBuilder();

    try {

        FileReader reader = new FileReader(f);

        while (reader.read(cbuf) != -1) {
            builder.append(cbuf);
        }

        reader.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    long currentTime = System.currentTimeMillis();
    builder = null;
    System.gc();
    System.out.println(currentTime - oldTime);
}

Answer 3

为了获得更好的性能，您可以使用BufferedReader 。 此类允许您逐行读取文件。 与通过逐字读取文件来浪费时间相比，此方法将更快地执行任务。 您可以在半秒内读取纯文本文件（大小：1 MB）。 只需使用以下代码。

File f = new File（“ File path”）;
FileReader fr = new FileReader（f）
BufferedReader br = new BufferedReader（fr）;

字符串line =“”;
StringBuilder builder = new StringBuilder（）;
尝试{
而（（行= br.readLine（））！= NULL）
builder.append（线+ “\\ n”）;
}
catch（异常e）
{
e.printStackTrace（）;
}

使用System.currentTimeMillis()时，您可以检查读取文件所需的时间。

Answer 4

请看下面的链接，阅读《使用Java的真正大文件》（150GB）。

[ http://www.answerques.com/s1imeegPeQqU/reading-really-big-files-with-java][1]

将文本文件读取为字符串，而不会占用大量内存

问题描述

4 个解决方案

解决方案1
1 已采纳 2013-08-25 11:58:34

解决方案2
0

解决方案3
0 2013-08-25 20:35:37

解决方案4
0 2015-01-06 12:54:03

将文本文件读取为字符串，而不会占用大量内存

问题描述

4 个解决方案

解决方案1 1 已采纳 2013-08-25 11:58:34

解决方案2 0

解决方案3 0 2013-08-25 20:35:37

解决方案4 0 2015-01-06 12:54:03

解决方案1
1 已采纳 2013-08-25 11:58:34

解决方案2
0

解决方案3
0 2013-08-25 20:35:37

解决方案4
0 2015-01-06 12:54:03