Parallelizing double nested for loop

Question

I have a very large lists, so I need to speed up the whole, I'm trying to parallelize this for loop:

public HashMap<String, String> getData()
{
    //Both list are ArrayList<String>
    HashMap<String, String> hashMap = new HashMap<>();
    for (int w = 0; w < firstList.size(); w++) {
        boolean once = false;
        for (int j = 0; j < secondList.size(); j++) {
            if (!once && secondList.get(j).var.startsWith(firstList.get(w).var.toLowerCase())) {
                hashMap.put(firstList.get(w).var, secondList.get(j).var);
                once = true;
            }
        }
    }
    return hashMap;
}

I've found this good answer Parallelizing a for loop , but not really understand how to apply it to my case, I should create two Callable <output> for <K, V> of my HashMap ?

Or am I doing wrong to use this method ?

Answer 1

I would start by rewriting it with streams. Not only will that make the code parallelizable, but it will also make it more readable. It will also avoid all the repetitions present in the original code, and make sure to iterate on the lists in an optimal way:

private static final class Entry {
    private final String first;
    private final String second;

    // constructor, getters left as an exercise
}

public Map<String, String> getData() {
    return firstList.stream()
        .flatMap(firstListElement -> {
            String lowercase = firstListElement.toLowerCase();
            return secondList.stream()
                             .filter(secondListElement -> secondListElement.startsWith(lowercase))
                             .limit(1)
                             .map(secondListElement -> new Entry(firstListElement, secondListElement));
        })
        .collect(Collectors.toMap(Entry::getFirst, Entry::getSecond));   
}

Then I would measure the time it takes to execute that, and compare the time it takes to execute the same code but with firstList.parallelStream() instead.

Answer 2

The problem is not how to parallelize the loop. You are using the wrong approach.
If I understand correctly you want for each element of list 1 to add in a hashmap 1 entry from list 2 that starts with the same string.
First of all I don't understand why you don't break out of the loop when you find a match and you use the once variable.
Also why do you need once variable since you can check if the word of list1 already exists in the hashmap?
Anyway you should be using a TreeMap (check NavigableMap interface) instead of a hashmap which checks for close matches.
Also why can't you do this logic when creating the lists in the first place?
Perhaps you are trying to optimise the wrong thing?

Answer 3

Something like this would do the work in parallel (on the outer list). But with a list which is only 281, this likely does not add much value.

On the inner list, if it only mattered that you find a match, not the first match, then that work could be parallelized as well, which would be more likely to have significant impact.

final ConcurrentMap<String, String> results = new ConcurrentHashMap<>();
firstList.stream()
         .unordered()
         .parallel()
         .map(v1 -> v1.var)
         .forEach(var -> {
             final String lowerVar1 = var.toLowerCase();
             secondList.stream()
                       .filter(v2 -> v2.var.startsWith(lowerVar1))
                       .findFirst()
                       .ifPresent(v2 -> results.put(var, v2.var);
         });

Parallelizing double nested for loop

Question

3 answers

solution1
1 2016-06-25 15:56:14

solution2
1 ACCPTED 2016-06-25 16:20:53

solution3
0 2016-06-25 15:56:24

Parallelizing double nested for loop

Question

3 answers

solution1 1 2016-06-25 15:56:14

solution2 1 ACCPTED 2016-06-25 16:20:53

solution3 0 2016-06-25 15:56:24

solution1
1 2016-06-25 15:56:14

solution2
1 ACCPTED 2016-06-25 16:20:53

solution3
0 2016-06-25 15:56:24