简体   繁体   中英

Java GC overhead limit exceeded

I am trying to preprocess a large txt file (10G), and store it in binary file for future use. As the code runs it slows down and ends with

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

The input file has the following structure

200020000000008;0;2
200020000000004;0;2
200020000000002;0;2
200020000000007;1;2

This is the code I am using:

        String strLine;

        FileInputStream fstream = new FileInputStream(args[0]);
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); 

        //Read File Line By Line
        HMbicnt map = new HMbicnt("-1");
        ObjectOutputStream  outputStream = null;
        outputStream = new ObjectOutputStream(new FileOutputStream(args[1]));

        int sepIndex = 15;

        int sepIndex2 = 0;
        String str_i = "";
        String bb = "";
        String bbBlock = "init";

        int cnt = 0;
        lineCnt = 0;
        while ((strLine = br.readLine()) != null)   {
            //rozparsovat radek         
            str_i = strLine.substring(0, sepIndex);
            sepIndex2 = strLine.substring(sepIndex+1).indexOf(';');
            bb = strLine.substring(sepIndex+1, sepIndex+1+sepIndex2);
            cnt = Integer.parseInt(strLine.substring(sepIndex+1+sepIndex2+1));
            if(!bb.equals(bbBlock)){
                outputStream.writeObject(map);
                outputStream.flush();
                map = new HMbicnt(bb);
                map.addNew(str_i + ";" + bb, cnt);
                bbBlock = bb;
            }
            else{
                map.addNew(str_i + ";" + bb, cnt);
            }
        }
        outputStream.writeObject(map);

        //Close the input stream
        br.close();
        outputStream.writeObject(map = null);
        outputStream.close();

Basically, it goes through the in file and stores data to the object HMbicnt (which is a hash map). Once it encounters new value in second column it should write object to the output file, free memory and continue.

Thanks for any help.

I think the problem is not that 10G is in memory, but that you are creating too many HashMaps. Maybe you could clear the HashMap instead of re-creating it after you don't need it anymore. There seems to have been a similar problem in java.lang.OutOfMemoryError: GC overhead limit exceeded , it is also about HashMaps

Simply put, you're using too much memory. Since, as you said, your file is 10 GB, there is no way you're going to be able to fit it all into memory (unless, of course, you happen to have over 10 GB of RAM and have configured Java to use it).

From what I can tell from your code and description of it, you're reading the entire file into memory and adding it to one huge in-RAM map as you're doing so, then writing your result to output. This is not feasible. You'll need to redesign your code to work in-place (ie only keep a small portion of the file in memory at any given time).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM