简体   繁体   English

如何使用StringBuilder读取Java中的大文本文件?

[英]How StringBuilder can be used to read large text files in Java?

Is there any mechanism in Java to reduce the memory usage while reading large text files? Java中是否有任何机制可以减少读取大型文本文件时的内存使用量?

Almost every program I've come across uses String to read text files.But Java reserves space for each String literal.That's why I think memory usage gets increased since all String objects are stored. 我遇到的几乎每个程序都使用String读取文本文件,但是Java为每个String文字保留了空间,这就是为什么我认为由于存储了所有String对象而导致内存使用量增加了的原因。 All the classes of java.io deals with String. java.io的所有类都处理String。 But if we're not using StringBuilder then how can we reduce memory usage? 但是,如果我们不使用StringBuilder,那么如何减少内存使用量呢?

After all reducing memory usage is the primary concern of StringBuilder[since it's not immutable like String]. 毕竟,减少内存使用是StringBuilder的主要考虑因素(因为它不是像String一样不可改变的)。 Then how can we exploit its feature in Java I/O operation without using String ie without using something like this: sb.append([String object]); 那么我们如何在不使用String的情况下,即在不使用类似这样的东西的情况下,在Java I / O操作中利用其功能:sb.append([String object]);

Assume you have n strings, each of length 1 that you read from your input - for simplicity. 为了简单起见,假设您有n字符串,每个字符串的长度都是从输入中读取的,长度为1。

Using operator+ on strigns while reading will create a String object each time you concatenate strings, so you get strings of length 1,2,3,...,n 每次连接字符串时在strigns上使用operator+都会在每次连接字符串时创建一个String对象,因此您将获得长度为1,2,3,...,n的字符串

So the total memory usage of the strings combined is 1 + 2 + .. + n = O(n^2) in addition to the n strings you read from input 因此,除了从输入中读取的n字符串外,组合的字符串的总内存使用量为1 + 2 + .. + n = O(n^2)

while if you use StringBuilder to create the final string, you actually create n - for input [each of length 1] and one object for the final string - of size n , so total memory usage of 1 + 1 + .. + 1 + n = O(n) 而如果您使用StringBuilder创建最终字符串,则实际上会为输入[每个长度为1]创建n ,为最终字符串创建一个对象-大小为n ,因此总内存使用量为1 + 1 + .. + 1 + n = O(n)

So, even if you use sb.append(String) - the space usage is asymptotcally beter then creating all intermediate strings - since you do not need to create intermediate String objects. 因此,即使您使用sb.append(String) -空间使用情况也sb.append(String)理想,然后创建所有中间字符串-因为您无需创建中间String对象。

In addition - the performance [time] should be better when using StringBuilder - both because you create less objects, and both because of lesser memory usage - the gc doesn't need to work as hard as when concatenating strings naively. 另外-使用StringBuilder时,性能[时间]应该更好-两者都因为创建了更少的对象,并且都因为减少了内存使用-gc不需要像幼稚地连接字符串那样费劲。

(*)Note that it is easy to see that the above still holds for any length of strings. (*)请注意,以上内容仍然适用于任何长度的字符串。

您可以使用StringBuilders的append char方法来避免创建中间字符串,请参阅以下文章: https : //stackoverflow.com/a/9849624/102483请记住,没有任何方法可以减少内存的占用空间。 final String,以使其小于您正在读取的文件的大小。

Depending on what you are doing, you could create a pool of String and/or StringBuilder objects that are loaded with the values you need, cleared out and then reused. 根据您的操作,您可以创建一个String和/或StringBuilder对象池,这些对象加载有所需的值,然后清除然后再使用。 You could configure the pool to grow to a maximum value, and if the objects in the pool are not used, then set them to null where they will eventually be reclaimed by the garbage collector. 您可以将池配置为最大,如果不使用池中的对象,则将它们设置为null,最终垃圾回收器将回收它们。

You might want to consider something like this: 您可能要考虑这样的事情:

  BufferedReader reader = 
    new BufferedReader(
      new InputStreamReader(
        new ByteArrayInputStream(data)));
  String line;

  while ((line = reader.readLine()) != null)
    ...

See these links for more details: 有关更多详细信息,请参见以下链接:

BufferedReader for large ByteBuffer? 大字节缓冲区的BufferedReader吗?

http://www.tutorialspoint.com/java/java_bytearrayinputstream.htm http://www.tutorialspoint.com/java/java_bytearrayinputstream.htm

Reader and its subclasses are based around char and char[], only convenience methods use String. Reader及其子类基于char和char [],只有便利方法使用String。 Since StringBuilder.append() accepts char[] you can avoid creating unnecessary String objects if you only use the methods build around char[]. 由于StringBuilder.append()接受char [],因此,如果仅使用围绕char []构建的方法,则可以避免创建不必要的String对象。

Note that while this reduces the number of temporary created String objects the overall memory requirements stay the same, the gc would collect any otherwise created String. 请注意,虽然这减少了临时创建的String对象的数量,但总体内存需求保持不变,但gc会收集所有其他创建的String。

Instead of String, try using StringBuilder to append data read from a file. 代替String,尝试使用StringBuilder附加从文件读取的数据。 If you use String you might end up creating multiple string objects in memory. 如果使用String ,则可能最终会在内存中创建多个字符串对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM