Why am I getting an OutOfMemoryError resizing my HashTable implementation?

Question

I am trying to rehash() my HashTable every time I get a collision but I keep getting a Java heap space error.

Basically, I have a String[] table whose length I want to multiply by 2 every time I have a collision in my hash.

Edit : I am using insert() in a while loop which loads around 300.000 words into the hash table.

 public void rehash() {
        String[] backup = table;
        size = size * 2;
        // i get the error on the line below
        table = new String[size];
        System.out.println("size" + size);
        for (int i = 0; i < backup.length; i++) {
            if (backup[i] != null) {
                insert(backup[i]);
            }

        }

   public void insert(String str) {

        int index = hashFunction(str);

        if (index > size || table[index] != null) {
            rehash();
        }

        table[index] = str;
    }

My hash function :

int val= 0;
        val= s.hashCode();
        if (val< 0) {
            val*= -1;
        }

        while (val> this.size) {
            val%= this.size;
        }

        return val;


 public void load() {
        String str = null;
        try {
            BufferedReader in = new BufferedReader(new FileReader(location));
            while ((str = in.readLine()) != null) {
                insert(str);
            }
            in.close();
        } catch (Exception e) {
            System.out.println("exception");
        }
    }

Answer 1

From the hash function you have posted is not clear what it returns but looks like it has an issue.

int index = hashFunction(str);

here if your index is not proper than your code is doing a lot of recursive new String[size].Put a counter or debug point here and check.

 if (index > size || table[index] != null) {
                rehash();
            }

Answer 2

No matter how big you make the table you cannot completely avoid collisions. Try this program for example:

System.out.println("Aaa".hashCode());
System.out.println("AbB".hashCode());
System.out.println("BBa".hashCode());
System.out.println("BCB".hashCode());

The output is:

They are four different strings with exactly the same hashcode. Exact collisions of this sort are not even that rare. (The hash algorithm used by the Java String class is not actually a very good one, but it is kept for backwards compatibility reasons.)

So, making the hashtable bigger (using a larger portion of the hashcode) reduces the number of collisions, but will never completely prevent them, because sometimes the hashcodes for different values are exactly the same .

A hashtable must be prepared to deal with a limited number of collisions by being able to store a set of different values in a single slot of the table. This is typically done by using a linked list for values that share the same hashcode. The current implementation of java.util.HashMap does something more advanced: if values with the same hashcode implement the Comparable interface (as String does), it uses that to arrange them in a binary tree. There is also something possible called dynamic perfect hashing , where collisions are prevented by dynamically changing the hash algorithm to ensure each distinct value gets a distinct hash, but that is more complex.

A few other issues I see in your code:

There is no need to initialize val with 0 if you immediately assign something else to it on the next line. You can instead do int val; val = s.hashCode(); int val; val = s.hashCode(); or simply int val = s.hashCode(); .
The check: if (val < 0) val *= -1; is not completely reliable because if val is exactly equal to Integer.MIN_VALUE , multiplying it by -1 overflows and produces Integer.MIN_VALUE as the result. To completely prevent negative values, mask out the integer's sign bit by doing val &= Integer.MAX_VALUE; .
The condition here is wrong: while (val > this.size) val %= this.size; . It should be val >= this.size . However, there is no need to loop at all. Doing the modulo operation once unconditionally with no while/if is enough. Alternatively if you maintain the table size as an exact power of 2, you can implement the mod operation as: val &= (size - 1); , which is a little faster and will also fulfill the requirement of ensuring the result is non-negative, unlike % .
In the insert method it would have to be if (index >= size ... , not if (index > size ... , but actually there is no need for that check at all, if the hash function already ensures the hash is in range.
When the table slot is already occupied, you need to check if it already contains the same string you are trying to insert (in which case you can return from the method immediately) and not just assume it's a different value with a collision.

Answer 3

From javadoc

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

If you know that map will be used to store N records aprox, a good initialCapacity will be N/.75 + N/10 - considering a variance of 10%.

Its OK to get an OutOfMemory error, but its not ok to program to rehash - try best to avoid it.
For rehash - you shouldn't wait till collision. From HashMap class,

This (resize) method is called automatically when the number of keys in this map reaches its threshold

where threshold = (int)(capacity * loadFactor);

Why am I getting an OutOfMemoryError resizing my HashTable implementation?

Question

3 answers

solution1
0 2015-04-11 20:28:58

solution2
0 2015-04-11 22:00:08

solution3
-1 2015-04-11 21:06:59

Why am I getting an OutOfMemoryError resizing my HashTable implementation?

Question

3 answers

solution1 0 2015-04-11 20:28:58

solution2 0 2015-04-11 22:00:08

solution3 -1 2015-04-11 21:06:59

solution1
0 2015-04-11 20:28:58

solution2
0 2015-04-11 22:00:08

solution3
-1 2015-04-11 21:06:59