简体   繁体   English

超出Java GC开销限制

[英]Java GC overhead limit exceeded

I am trying to preprocess a large txt file (10G), and store it in binary file for future use. 我正在尝试预处理大型txt文件(10G),并将其存储在二进制文件中以备将来使用。 As the code runs it slows down and ends with 当代码运行时,它会变慢并以

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded 线程“主”中的异常java.lang.OutOfMemoryError:超出了GC开销限制

The input file has the following structure 输入文件具有以下结构

200020000000008;0;2
200020000000004;0;2
200020000000002;0;2
200020000000007;1;2

This is the code I am using: 这是我正在使用的代码:

        String strLine;

        FileInputStream fstream = new FileInputStream(args[0]);
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); 

        //Read File Line By Line
        HMbicnt map = new HMbicnt("-1");
        ObjectOutputStream  outputStream = null;
        outputStream = new ObjectOutputStream(new FileOutputStream(args[1]));

        int sepIndex = 15;

        int sepIndex2 = 0;
        String str_i = "";
        String bb = "";
        String bbBlock = "init";

        int cnt = 0;
        lineCnt = 0;
        while ((strLine = br.readLine()) != null)   {
            //rozparsovat radek         
            str_i = strLine.substring(0, sepIndex);
            sepIndex2 = strLine.substring(sepIndex+1).indexOf(';');
            bb = strLine.substring(sepIndex+1, sepIndex+1+sepIndex2);
            cnt = Integer.parseInt(strLine.substring(sepIndex+1+sepIndex2+1));
            if(!bb.equals(bbBlock)){
                outputStream.writeObject(map);
                outputStream.flush();
                map = new HMbicnt(bb);
                map.addNew(str_i + ";" + bb, cnt);
                bbBlock = bb;
            }
            else{
                map.addNew(str_i + ";" + bb, cnt);
            }
        }
        outputStream.writeObject(map);

        //Close the input stream
        br.close();
        outputStream.writeObject(map = null);
        outputStream.close();

Basically, it goes through the in file and stores data to the object HMbicnt (which is a hash map). 基本上,它遍历in文件并将数据存储到对象HMbicnt(这是一个哈希图)。 Once it encounters new value in second column it should write object to the output file, free memory and continue. 一旦它在第二列中遇到新值,就应该将对象写入输出文件,释放内存并继续。

Thanks for any help. 谢谢你的帮助。

I think the problem is not that 10G is in memory, but that you are creating too many HashMaps. 我认为问题不是内存中有10G,而是您创建了太多的HashMap。 Maybe you could clear the HashMap instead of re-creating it after you don't need it anymore. 也许您可以清除HashMap而不是在不再需要它之后重新创建它。 There seems to have been a similar problem in java.lang.OutOfMemoryError: GC overhead limit exceeded , it is also about HashMaps 似乎在java.lang.OutOfMemoryError中出现了类似的问题:超出了GC开销限制 ,这也与HashMaps有关

Simply put, you're using too much memory. 简而言之,您正在使用过多的内存。 Since, as you said, your file is 10 GB, there is no way you're going to be able to fit it all into memory (unless, of course, you happen to have over 10 GB of RAM and have configured Java to use it). 如您所说,由于文件为10 GB,因此无法将其全部装入内存(除非,除非您碰巧拥有超过10 GB的RAM并配置了Java以供使用它)。

From what I can tell from your code and description of it, you're reading the entire file into memory and adding it to one huge in-RAM map as you're doing so, then writing your result to output. 从我的代码和描述中可以看出,您正在将整个文件读入内存,并在执行过程中将其添加到一个巨大的RAM映射中,然后将结果写入输出。 This is not feasible. 这是不可行的。 You'll need to redesign your code to work in-place (ie only keep a small portion of the file in memory at any given time). 您需要重新设计代码以就地工作(即,在任何给定时间仅将文件的一小部分保留在内存中)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM