Why is this memoization faster with an array than with a map?

Question

I was solving combination sum IV on leetcode (#377), which reads: "Given an integer array with all positive numbers and no duplicates, find the number of possible combinations that add up to a positive integer target."

I solved it in Java using a top down recursive approach with a memoization array:

public int combinationSum4(int[] nums, int target){
    int[] memo = new int[target+1];
    for(int i = 1; i < target+1; i++) {
        memo[i] = -1;
    }
    memo[0] = 1;
    return topDownCalc(nums, target, memo);
}

public static int topDownCalc(int[] nums, int target, int[] memo) {
    if (memo[target] >= 0) {
        return memo[target];
    }
    
    int tot = 0;
    for(int num : nums) {
        if(target - num >= 0) {
            tot += topDownCalc(nums, target - num, memo);
        }
    }
    memo[target] = tot;
    return tot;
}

Then I figured I was wasting time by initializing the entire memo array and could just use a Map instead (which would also save space / memory). So I rewrote the code as follows:

public int combinationSum4(int[] nums, int target) {
    Map<Integer, Integer> memo = new HashMap<Integer, Integer>();
    memo.put(0, 1);
    return topDownMapCalc(nums, target, memo);
}

public static int topDownMapCalc(int[] nums, int target, Map<Integer, Integer> memo) {
    if (memo.containsKey(target)) {
        return memo.get(target);
    }
    
    int tot = 0;
    for(int num : nums) {
        if(target - num >= 0) {
            tot += topDownMapCalc(nums, target - num, memo);
        }
    }
    memo.put(target, tot);
    return tot;
}

I am confused though, because after submitting the second version of my code Leetcode said it was slower and used more space than the first code. How does the HashMap use more space and run slower than an array whos values all had to be initialized and whos length is greater than the HashMaps size?

Answer 1

These things came to my mind first:

HashMap is what the name implies, a hash-based map. Sowhenever you put something into it or get something out of it, it has to hash the key, then find the target based on that hash.
put() operation isn't just a walk in the park, either - you can check here to get an idea what it does. Definitely more than array assignment.
in java it doesn't work with primitives, so for each value you have to convert ints to Integers and vice versa. (as noted by others, there are int-specialized map alternatives available, but not in standard lib)
aaand since you're not initializing it, it might need to resize internally several times during your run - default size for a hashmap is just 16 - which is definitely more expensive then one-shot initialization you did with array. here's what each resizing does .
it also works with Entry objects that it needs for each internal entry it's got, and all those objects also take some space, plenty more than just having an array of integers

So I wouldn't think a hashmap would save you neither space or time. Why would it?

Answer 2

Then I figured I was wasting time by initializing the entire memo array

You could have stored 'answer + 1' instead, so that the default value (0) can now be a placeholder for 'not calculated yet', and save that initialization. Not that it is expensive. Let's dig into cache pages .

Cache pages

CPUs are complex beasts. They don't operate on memory directly; not anymore. They literally cannot do it; the chip's calculating parts are simply not hooked up. at all. Instead, the CPU has caches, which come in set sizes (for example, 64k - you can't have a single cache node hold more or less than precisely 64k, and that entire 64k is then considered to be a cached copy of some 64k segment of main memory). One such node is called a cache page .

The CPU can only operate on cache pages.

In java, int[] leads to a contiguous, straight sized chunk of memory representing the data. In other words, an int[] x = new int[1000] would declare a single chunk of memory of 1000*4 = 4000 bytes (because ints are 4 bytes, and you reserved room for 1000 of em). That fits inside a single page. So, when you write your loop to initialize the values to -1, that's asking the CPU to loop through a single cache page and write some data to it. CPUs have pipelines and other speedup factors; this costs maybe 250 cycles.

Contrast to the cost of fetching a cache page: The CPU will be twiddling its thumbs (which is good; it can cool down some, and on modern hardware, often the CPU is limited not by its raw speed capabilities, but by the ability of the system to wick away the thermal impact of having it run. - it can also spend time on other threads/processes) whilst it farms out the job of fetching some chunk of memory into a cache page to the memory controller, Nevertheless. that thumb twiddling takes on the order of magnitude of 500 cycles or more, It's nice the CPU gets to cool down or focus on other things during it. but it's still the case that writing 4000 contiguous bytes in a tight loop is faster than a single cache miss.

Thus, 'fill a 1000-large int array with -1s' is an extremely cheap operation.

Wrapper objects

maps operate on objects, not ints, which is why you had to write Integer and not int . an Integer , in memory at least, is a much, much larger load on memory. It's an entire object, containing an int field. Then, your variable (or your map) holds a pointer to it.

So, an int[] x = new int[1000] takes 4000 bytes, plus some change for the object headers (maybe add 12 bytes to it all), and 1 reference (depends on VM, but let's say 64 bit), for a grand total of 4020 bytes.

In contrast,

Integer[] x = new Integer[1000];
for (int i = 0; i < 1000; i++) x[i] = i;`

is much, much larger. It's 1000 pointers (can be as large as 8 bytes per pointer, or as small as 4. So, 4000 to 8000 bytes), to 1000 separate integer objects. Each integer object gets the object overhead (~12 bytes or more), + 1 integer field, generally word-aligned (so, 64-bits, even though it's only 32-bit, assuming a 64-bit VM running on 64-bit hardware, which is going to be the case on anything modern), for another 20000 bytes. A grand total of something closer to 30000 bytes.

That is about 8x more memory required .

Then consider that the 'key' in your memoized array is inherent (it's the index into the array) whereas in the map the key needs separate storage and it gets worse still: Each k/v pair in your map occupies at least 12+12+8+8+8+8 bytes (2 object overheads and 2 int fields for your key and value Integer objects, and 2 pointers for the map to point at these), 56 bytes. In contrast to your int[] which does it in 4.

That gives you a rate of 56/4 = 14.

If your map contains only 1 in 14 numbers, then the map should be about as large as your array, because the map can do a thing your array can't: The array is as large as it has to be from the get-go, the map only needs to store required nodes.

Still, one would assume for most 'interesting' inputs, the coverage factor of that map is going to be far north of 7.14%, thus resulting in the map being larger.

The map also has its objects smeared out over memory which risks them being in more than one cache page: Large memory load + fragmentation = an easy road to having the CPU wait for multiple cache page fetches vs. being able to do all the work in one go, never having to wait for cache misses.

Can it be faster?

Yeah, probably - but with map occupancy rates at 10% or higher, the concept of using a map to save space is dubious. If you want to try, you'd need a map specifically designed to hold ints and nothing else. These do exist, such as eclipse collections' IntIntMap .

But I bet in this case the simple array memoization strategy is just the winner, even if you use IntIntMap.

Why is this memoization faster with an array than with a map?

Question

2 answers

solution1
2 2021-01-23 21:16:19

solution2
2 2021-01-23 21:30:57

Cache pages

Wrapper objects

Can it be faster?

Why is this memoization faster with an array than with a map?

Question

2 answers

solution1 2 2021-01-23 21:16:19

solution2 2 2021-01-23 21:30:57

Cache pages

Wrapper objects

Can it be faster?

solution1
2 2021-01-23 21:16:19

solution2
2 2021-01-23 21:30:57