简体   繁体   中英

Mapping large set of Keys to a small set of Values

If you had 1,000,000 keys (ints) that mapped to 10,000 values (ints). What would be the most efficient way (lookup performance and memory usage) to implement.

Assume the values are random. ie there is not a range of keys that map to a single value.

The easiest approach I can think of is a HashMap but wonder if you can do better by grouping the keys that match a single value.

Map<Integer,Integer> largeMap = Maps.newHashMap();
largeMap.put(1,4);
largeMap.put(2,232);
...
largeMap.put(1000000, 4);

If the set of keys is known to be in a given range (as 1-1000000 shown in your example), then the simplest is to use an array. The problem is that you need to look up values by key, and that limits you to either a map or an array.

The following uses a map of values to values simply to avoid duplicate instances of equal value objects (there may be a better way to do this, but I can't think of any). The array simply serves to look up values by index:

private static void addToArray(Integer[] array, int key, 
        Integer value, Map<Integer, Integer> map) {

    array[key] = map.putIfAbsent(value, value);
}

And then values can be added using:

Map<Integer, Integer> keys = new HashMap<>();
Integer[] largeArray = new Integer[1000001];

addToArray(largeArray, 1, 4, keys);
addToArray(largeArray, 2, 232, keys);
...
addToArray(largeArray, 1000000, 4, keys);

If new Integer[1000001] seems like a hack, you can still maintain a sort of "index offset" to indicate the actual key associated with index 0 in the array.


And I'd put that in a class:

class LargeMap {

    private Map<Integer, Integer> keys = new HashMap<>();
    private Integer[] keyArray;

    public LargeMap(int size) {
        this.keyArray = new Integer[size];
    }

    public void put(int key, Integer value) {
        this.keyArray[key] = this.keys.putIfAbsent(value, value);
    }

    public Integer get(int key) {
        return this.keyArray[key];
    }
}

And:

public static void main(String[] args) {
    LargeMap myMap = new LargeMap(1000_000);

    myMap.put(1, 4);
    myMap.put(2, 232);
    myMap.put(1000_000, 4);
}

I'm not sure if you can optimize much here by grouping anything. A 'reverse' mapping might give you slightly better performance if you want to do lookup by values instead of by key (ie get all keys with a certain value) but since you didn't explicitly said that you want to do this I wouldn't go with that approach.

For optimization you can use an int array instead of a map, if the keys are in a fixed range. Array lookup is O(1) and primitive arrays use less memory than maps.

int offset = -1;
int[] values = new int[1000000];
values[1 + offset] = 4;
values[2 + offset] = 232;
// ...
values[1000000 + offset] = 4;

If the range doesn't start at 1 you can adapt the offset.

There are also libraries like trove4j which provide better performance and more efficient storage for this kind of data than than standard collections, though I don't know how they compare to the simple array approach.

HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM