简体   繁体   中英

Why is my hashset so memory-consuming?

I found out the memory my program is increasing is because of the code below, currently I am reading a file that is about 7GB big, and I believe the one that would be stored in the hashset is lesson than 10M, but the memory my program keeps increasing to 300MB and then crashes because of OutofMemoryError. If it is the Hashset problem, which data structure shall I choose?

    if(tagsStr!=null) {
        if(tagsStr.contains("a")||tagsStr.contains("b")||tagsStr.contains("c")) {
            maTable.add(postId);
        }
    } else {
        if(maTable.contains(parentId)) {
            //do sth else, no memories added here
        }
    }

You haven't really told us what you're doing, but:

  • If your file is currently in something like ASCII, each character you read will be one byte in the file or two bytes in memory.
  • Each string will have an object overhead - this can be significant if you're storing lots of small strings
  • If you're reading lines with BufferedReader (or taking substrings from large strings), each one may have a large backing buffer - you may want to use maTable.add(new String(postId)) to avoid this
  • Each entry in the hash set needs a separate object to keep the key/hashcode/value/next-entry values. Again, with a lot of entries this can add up

In short, it's quite possible that you're doing nothing wrong, but a combination of memory-increasing factors are working against you. Most of these are unavoidable, but the third one may be relevant.

You've either got a memory leak or your understanding of the amount of string data that you are storing is incorrect. We can't tell which without seeing more of your code.

The scientific solution is to run your application using a memory profiler, and analyze the output to see which of your data structures is using an unexpectedly large amount of memory.


If I was to guess, it would be that your application (at some level) is doing something like this:

String line;
while ((line = br.readLine()) != null) {
    // search for tag in line
    String tagStr = line.substring(pos1, pos2);
    // code as per your example
}

This uses a lot more memory than you'd expect. The substring(...) call creates a tagStr object that refers to the backing array of the original line string. Your tag strings that you expect to be short actually refer to a char[] object that holds all characters in the original line.

The fix is to do this:

    String tagStr = new String(line.substring(pos1, pos2));

This creates a String object that does not share the backing array of the argument String.

UPDATE - this or something similar is an increasingly likely explanation ... given your latest data.


To expand on another of Jon Skeet's point, the overheads of a small String are surprisingly high. For instance, on a typical 32 bit JVM, the memory usage of a one character String is:

  • String object header for String object: 2 words
  • String object fields: 3 words
  • Padding: 1 word (I think)
  • Backing array object header: 3 words
  • Backing array data: 1 word

Total: 10 words - 40 bytes - to hold one char of data ... or one byte of data if your input is in an 8-bit character set.

(This is not sufficient to explain your problem, but you should be aware of it anyway.)

Couldn't be it possible that the data read into memory (from the 7G file) is somehow not freed? Something ike Jon puts... ie. since strings are immutable every string read requires a new String object creation which might lead to out of memory if GC is not quick enough...

If the above is the case than you might insert some 'breakpoints' into your code/iteration, ie. at some defined points, issue gc and wait till it terminates.

Run your program with -XX:+HeapDumpOnOutOfMemoryError . You'll then be able to use a memory analyser like MAT to see what is using up all of the memory - it may be something completely unexpected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM