Why is iterating a map slower than iterating a list?

Question

I was asked this question in an interview and the interviewer wanted to discuss the trade-offs on all the approaches I could think of:

Design and implement a TwoSum class. It should support the following operations: add and find.

add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers whose sum is equal to the value.

I came up with the below solution first which is very straight forward.

Design1:

public class TwoSumDesign1 {
  private final Map<Integer, Integer> map = new HashMap<Integer, Integer>();

  public void add(int number) {
    map.put(number, map.getOrDefault(number, 0) + 1);
  }

  public boolean find(int value) {
    for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
      int i = entry.getKey();
      int j = value - i;
      if ((i == j && entry.getValue() > 1) || (i != j && map.containsKey(j))) {
        return true;
      }
    }
    return false;
  }
}

But then doing some research I found out that we can use List to store all the numbers and iterating a list is faster than iterating keySet , but I still don't understand why?

Referenced from : https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Design2:

public class TwoSumDesign2 {
  private final List<Integer> list = new ArrayList<Integer>();
  private final Map<Integer, Integer> map = new HashMap<Integer, Integer>();

  // Add the number to an internal data structure.
  public void add(int number) {
    if (map.containsKey(number))
      map.put(number, map.get(number) + 1);
    else {
      map.put(number, 1);
      list.add(number);
    }
  }

  // Find if there exists any pair of numbers whose sum is equal to the value.
  public boolean find(int value) {
    for (int i = 0; i < list.size(); i++) {
      int num1 = list.get(i), num2 = value - num1;
      if ((num1 == num2 && map.get(num1) > 1) || (num1 != num2 && map.containsKey(num2)))
        return true;
    }
    return false;
  }
}

Can anyone explain what all the trade-offs are that we should think about with this problem and why the second solution is faster than iterating the map's keySet ?

Answer 1

First of all, let me mention that the performance difference that we are talking about is hardly worth considering. The phrase "Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important" is misleading. It is not very important. I would have rather have worded it "Thus, you might not want to set the initial capacity..."

Now that we got that covered, let's move on to the actual answer.

It has to do with how the internal data structure of a hash map is organized, as compared to the simple organization of a list.

The standard implementation of a hash map employs of a list of "buckets", where each bucket is a linked list of nodes. Keys and values are stored in these nodes. The list of buckets is not densely populated, meaning that many entries are null .

Therefore, in order to traverse all the keys of a map, you have to walk the list of buckets, and for each bucket, walk the nodes in the bucket.

Since there are as many nodes as there are keys, the walking of the nodes is of the same time complexity as the walking of an entire ArrayList would be, but then in the case of the hash map we also have to count the overhead of walking the list of buckets. And the larger the "initial size" of the hashmap is, or the smaller the fill factor, the more null buckets there will be, which means that there will be more entries in the list of buckets that you will visit in vain, only to find out that they are null and proceed to the next entry.

So, traversing a HashMap is slightly more expensive than traversing an ArrayList .

But believe me, the difference is so small that it is not really worth considering. Nobody will ever notice. It is much better to use the right data structure for your purpose, and not worry about minuscule gains in performance. The right data structure is always the data structure that yields the most elegant solution. The most elegant solution is the one that is easiest to read and understand what it does and how it does it.

Answer 2

The usual pitfall while iterating Map is to iterate over the keySet while using get(key) to retrieve the value associated for a key. You have avoided this by iterating over entrySet in design 1.

In practical terms iterating over HashMap will most likely be more expensive due to data locality. Compilers can introduce a number of optimizations when looping over an array. These won't be present when you have a list of Node objects backing the HashMap , see Bjarne Stroustrup: Why you should avoid Linked Lists .

However the design 1 is easier to read and understand. That's very important, premature optimization is the root of all evil. The actual difference in performance should be measured before you decide to optimize the code. It could very well be that the new List introduced in design 2 will actually decrease the performance due to more indirection in memory access (two data structures vs one).

Answer 3

In the case of 2nd design, there are two data structures introduce (HashMap and List). As per my understanding when we talk about the performance of code then check both scenarios Efficient Data Structure and Memory Utilization.

in 2nd Case, we need extra memory.

Design 1st is easier to read and understand and it could very well be that the new List introduced in design 2 will actually decrease the performance due to more indirection in memory access.

Why is iterating a map slower than iterating a list?

Question

3 answers

solution1
4 2019-03-08 19:56:05

solution2
2 2019-03-08 20:09:15

solution3
-1 2019-03-09 08:58:08

Why is iterating a map slower than iterating a list?

Question

3 answers

solution1 4 2019-03-08 19:56:05

solution2 2 2019-03-08 20:09:15

solution3 -1 2019-03-09 08:58:08

solution1
4 2019-03-08 19:56:05

solution2
2 2019-03-08 20:09:15

solution3
-1 2019-03-09 08:58:08