Optimal implementation of function in terms of performance

Question

I have a list of items and a map that is stores the information about the product and it's items data. There are around 150k items in the DB and around 200k products (each product has approximately 1000 to 2000 items that mapped to it).

I need a function that counts amount of products each item appears in. This is the function that I have implemented:

public Map<Integer, Integer> getProductsNumberForItem(List<Item> itemsList,
        Map<Integer, Map<Item, Integer>> itemsAmount) {
    Map<Integer, Integer> result = new HashMap<>();
    for (Item i : itemsList) {
        int count = 0;
        for (Map<Item, Integer> entry : itemsAmount.values()) {
            if (entry.containsKey(i)) {
                count++;
            }
        }
        result.put(i.getID(), count);
    }
    return result;
}

It works fine on my test DB, which has small amount of data, but when I run it on real data, it takes too much time (for ex.: it has been running already for an hour and still is not finished). From logical point of view its clear, that I am basically performing too many operations, but not sure how can I optimize.

Any suggestion is appreciated.

Answer 1

You have two ways :

most efficient : do the computation in a query executed in the database.
With count() aggregate and group by clause, you should get a much better result as the whole processing will be performed by the DBMS that is designed/optimized to do it.
less efficient but you may give it a try: retrieve the data as now and use multi-threading.
With Java 8 parallelStream() , you could maybe get an acceptable result without the hassle to handle synchronization yourself.

Answer 2

Best option is to delegate this computation to the db, avoiding the need to transfer all data to your application server.

If this is not an option, then for sure you can improve your current algorithm. Right now, for each item on the list, you are looping through all products; that's exponential cost.

you could do that (using streams since ressoning is easier to follow in my opinion and also allows for adding some improvements; but same could be achieved without them):

Stream<Item> productsItemsStream = itemsAmount.values().stream().flatMap(p -> p.keySet().stream());
Map<Item,Long> countByItemFound = productsItemsStream.collect(Collectors.groupingBy(Function.identity(), Collectors.counting());
Map<Integer, Integer> result = itemsList.stream().collect(Collectors.toMap(Item::getID, i -> countByItemFound.getOrDefault(i.getID(), 0L).intValue()));

With this approach you will do one full pass to product items. And then another pass to items list. That's linear cost.

Specificto streams, you could give a try to enable parallelism (adding parallelStream to my solution), but it's not completely granted to have big performance increase; depends on several factors. I would wait to see performance on proposed solution and, if needed, profile performance with and without parallelStream in your scenario.

Optimal implementation of function in terms of performance

Question

2 answers

solution1
2 ACCPTED 2017-08-09 20:34:24

solution2
0 2017-08-09 20:50:47

Optimal implementation of function in terms of performance

Question

2 answers

solution1 2 ACCPTED 2017-08-09 20:34:24

solution2 0 2017-08-09 20:50:47

solution1
2 ACCPTED 2017-08-09 20:34:24

solution2
0 2017-08-09 20:50:47