简体   繁体   中英

Idiomatically enumerating a Stream of objects in Java 8

How can one idiomatically enumerate a Stream<T> which maps each T instance to a unique integer using Java 8 stream methods (eg for an array T[] values , creating a Map<T,Integer> where Map.get(values[i]) == i evaluates to true )?

Currently, I'm defining an anonymous class which increments an int field for use with the Collectors.toMap(..) method:

private static <T> Map<T, Integer> createIdMap(final Stream<T> values) {
    return values.collect(Collectors.toMap(Function.identity(), new Function<T, Integer>() {

        private int nextId = 0;

        @Override
        public Integer apply(final T t) {
            return nextId++;
        }

    }));
}

However, is there not a more concise/elegant way of doing this using the Java 8 stream API? — bonus points if it can be safely parallelized.

Your approach will fail, if there is a duplicate element.

Besides that, your task requires mutable state, hence, can be solved with Mutable reduction . When we populate a map, we can simple use the map's size to get an unused id.

The trickier part is the merge operation. The following operation simply repeats the assignments for the right map, which will handle potential duplicates.

private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
    return values.collect(HashMap::new, (m,t) -> m.putIfAbsent(t,m.size()),
        (m1,m2) -> {
            if(m1.isEmpty()) m1.putAll(m2);
            else m2.keySet().forEach(t -> m1.putIfAbsent(t, m1.size()));
        });
}

If we rely on unique elements, or insert an explicit distinct() , we can use

private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
    return values.distinct().collect(HashMap::new, (m,t) -> m.put(t,m.size()),
        (m1,m2) -> { int leftSize=m1.size();
            if(leftSize==0) m1.putAll(m2);
            else m2.forEach((t,id) -> m1.put(t, leftSize+id));
        });

}

I would do it in this way:

private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
    List<T> list = values.collect(Collectors.toList());
    return IntStream.range(0, list.size()).boxed()
            .collect(Collectors.toMap(list::get, Function.identity()));
}

For sake or parallelism, it can be changed to

   return IntStream.range(0, list.size()).parallel().boxed().
                (...)

Comparing to convert the input stream to List first in the solution provided by Andremoniy. I would prefer to do it in different way because we don't know the cost of "toList()" and "list.get(i)", and it's unnecessary to create an extra List, which could be small or bigger

private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
    final MutableInt idx = MutableInt.of(0); // Or: final AtomicInteger idx = new AtomicInteger(0);        
    return values.collect(Collectors.toMap(Function.identity(), e -> idx.getAndIncrement()));
}

Regardless to the question, I think it's a bad design to pass streams as parameters in a method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM