简体   繁体   English

将Java文本文件复制到字符串中

[英]Copying a java text file into a String

I run into the following errors when i try to store a large file into a string. 尝试将大文件存储到字符串中时遇到以下错误。

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
    at java.lang.StringBuffer.append(StringBuffer.java:306)
    at rdr2str.ReaderToString.main(ReaderToString.java:52)

As is evident, i am running out of heap space. 显而易见,我的堆空间不足。 Basically my pgm looks like something like this. 基本上我的pgm看起来像这样。

FileReader fr = new FileReader(<filepath>);
sb = new StringBuffer();
char[] b = new char[BLKSIZ];

while ((n = fr.read(b)) > 0) 
     sb.append(b, 0, n);    

fileString = sb.toString();

Can someone suggest me why i am running into heap space error? 有人可以建议我为什么我遇到堆空间错误吗? Thanks. 谢谢。

You are running out of memory because the way you've written your program, it requires storing the entire, arbitrarily large file in memory. 您的内存已用完,因为编写程序的方式需要将整个任意大的文件存储在内存中。 You have 2 options: 您有2个选择:

  • You can increase the memory by passing command line switches to the JVM: 您可以通过将命令行开关传递给JVM来增加内存:

     java -Xms<initial heap size> -Xmx<maximum heap size> 
  • You can rewrite your logic so that it deals with the file data as it streams in, thereby keeping your program's memory footprint low. 您可以重写逻辑,以便在流进来时处理文件数据,从而使程序的内存占用量较低。

I recommend the second option. 我建议第二种选择。 It's more work but it's the right way to go. 这是更多的工作,但这是正确的方法。

EDIT: To determine your system's defaults for initial and max heap size, you can use this code snippet (which I stole from a JavaRanch thread ): 编辑:要确定系统的初始和最大堆大小默认值,可以使用以下代码片段(我从JavaRanch线程窃取了 ):

public class HeapSize {    
     public static void main(String[] args){      
         long kb = 1024;  
         long heapSize = Runtime.getRuntime().totalMemory();    
         long maxHeapSize = Runtime.getRuntime().maxMemory();  
         System.out.println("Heap Size (KB): " + heapSize/1024);  
         System.out.println("Max Heap Size (KB): " + maxHeapSize/1024);  
     }    
}
  • You allocate a small StringBuffer that gets longer and longer. 您分配的StringBuffer越来越小。 Preallocate according to file size, and you will also be a LOT faster. 根据文件大小进行预分配,您的速度也会更快。

  • Note that java is Unicode, the string likely not, so you use... twice the size in memory. 请注意,java是Unicode,字符串可能不是,因此您使用的是内存大小的两倍。

  • Depending on VM (32 bit? 64 bit?) and the limits set ( http://www.devx.com/tips/Tip/14688 ) you may simply not have enough memory available. 根据VM(32位还是64位)和设置的限制( http://www.devx.com/tips/Tip/14688 ),您可能根本没有足够的可用内存。 How large is the file actually? 文件实际有多大?

By default, Java starts with a very small maximum heap (64M on Windows at least). 默认情况下,Java从很小的最大堆开始(至少在Windows上为64M)。 Is it possible you are trying to read a file that is too large? 您是否可能试图读取太大的文件?

If so you can increase the heap with the JVM parameter -Xmx256M (to set maximum heap to 256 MB) 如果是这样,您可以使用JVM参数-Xmx256M增加堆(将最大堆设置为256 MB)

I tried running a slightly modified version of your code: 我尝试运行经过稍微修改的代码版本:

public static void main(String[] args) throws Exception{
    FileReader fr = new FileReader("<filepath>");
    StringBuffer sb = new StringBuffer();
    char[] b = new char[1000];
    int n = 0;
    while ((n = fr.read(b)) > 0) 
         sb.append(b, 0, n);    

    String fileString = sb.toString();
    System.out.println(fileString);
}

on a small file (2 KB) and it worked as expected. 在一个小文件(2 KB)上,它按预期工作。 You will need to set the JVM parameter. 您将需要设置JVM参数。

Kris has the answer to your problem. 克里斯(Kris)解决了您的问题。

You could also look at java commons fileutils' readFileToString which may be a bit more efficient. 您还可以查看java commons fileutils的readFileToString ,它可能会更有效。

Although this might not solve your problem, some small things you can do to make your code a bit better: 尽管这可能无法解决您的问题,但是您可以做一些小事情来使您的代码更好:

  • create your StringBuffer with an initial capacity the size of the String you are reading 创建具有初始容量的StringBuffer,该容量与您正在读取的String的大小相同
  • close your filereader at the end: fr.close(); 最后关闭文件阅读器:fr.close();

In the OP, your program is aborting while the StringBuffer is being expanded. 在OP中,在扩展StringBuffer您的程序正在中止。 You should preallocate that to the size you need or at least close to it. 您应该将其预分配为所需的大小,或者至少接近它。 When StringBuffer must expand it needs RAM for the original capacity and the new capacity. StringBuffer必须扩展时,它需要RAM用于原始容量和新容量。 As TomTom said too, your file is likely 8-bit characters so will be converted to 16-bit unicode in memory so it will double in size. 正如TomTom所说,您的文件可能是8位字符,因此将在内存中转换为16位unicode,因此文件大小将加倍。

The program has not even encountered yet the next doubling - that is StringBuffer.toString() in Java 6 will allocate a new String and the internal char[] will be copied again (in some earlier versions of Java this was not the case). 该程序甚至还没有遇到下一个加倍-Java 6中的StringBuffer.toString()将分配一个新的String ,并且内部char[]将再次被复制(在Java的某些早期版本中不是这种情况)。 At the time of this copy you will need double the heap space - so at that moment at least 4 times what your actual files size is (30MB * 2 for byte->unicode, then 60MB * 2 for toString() call = 120MB). 在进行此复制时,您将需要两倍的堆空间-因此,此时至少是实际文件大小的4倍(30MB * 2(对于字节-> unicode,然后60MB * 2对于toString()调用= 120MB)) 。 Once this method is finished GC will clean up the temporary classes. 此方法完成后,GC将清除临时类。

If you cannot increase the heap space for your program you will have some difficulty. 如果您不能为程序增加堆空间,则会遇到一些困难。 You cannot take the "easy" route and just return a String . 您不能采用“简单”的方法,而只能返回String You can try to do this incrementally so that you do not need to worry about the file size (one of the best solutions). 您可以尝试逐步执行此操作,这样就不必担心文件大小(最佳解决方案之一)。

Look at your web service code in the client. 在客户端中查看您的Web服务代码。 It may provide a way to use a different class other than String - perhaps a java.io.Reader , java.lang.CharSequence , or a special interface, like the SAX related org.xml.sax.InputSource . 它可能提供一种使用String以外的其他类的java.io.Reader -可能是java.io.Readerjava.lang.CharSequence或特殊接口,例如与SAX相关的org.xml.sax.InputSource Each of these can be used to build an implementation class that reads from your file in chunks as the callers needs it instead of loading the whole file at once. 这些中的每一个都可用于构建实现类,该实现类可根据调用者的需要从文件中分块读取,而不是立即加载整个文件。

For instance, if your web service handling routes can take a CharSequence then (if they are written well) you can create a special handler to return just one character at a time from the file - but buffer the input. 例如,如果您的Web服务处理路由可以采用CharSequence然后(如果它们写得很好),则可以创建一个特殊的处理程序以一次仅从文件返回一个字符,但可以缓冲输入。 See this similar question: How to deal with big strings and limited memory . 看到类似的问题: 如何处理大字符串和有限的内存

Trying to read an arbitrarily large file into main memory in an application is bad design. 试图将任意大的文件读取到应用程序的主内存中是错误的设计。 Period. 期。 No amount of JVM settings adjustments/etc... are going to fix the core issue here. 无需进行任何JVM设置调整/等操作即可解决此处的核心问题。 I recommend that you take a break and do some googling and reading about how to process streams in java - here's a good tutorial and here's another good tutorial to get you started. 我建议您休息一下,并仔细阅读并阅读有关如何在Java中处理流的信息-这是一个很好的教程 ,这是另一个入门的很好的教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM