简体   繁体   中英

Java reduce a collection of string to a map of occurence

Consider the a list as id1_f, id2_d, id3_f, id1_g , how can I use stream to get a reduced map in format of <String, Integer> of statistics like:

id1 2
id2 1
id3 1

Note: the key is part before _ . Is reduce function can help here?

This will get the job done:

Map<String, Long> map = Stream.of("id1_f", "id2_d", "id3_f", "id1_g")
  .collect(
    Collectors.groupingBy(v -> v.split("_")[0],
    Collectors.counting())
  );

You can also use the toMap collector:

myList.stream()
      .collect(Collectors.toMap((String s) -> s.split("_")[0], 
                   (String s) -> 1, Math::addExact);

if you care about the order of the elements then dump the result into a LinkedHashMap .

myList.stream()
      .collect(Collectors.toMap((String s) -> s.split("_")[0], 
                   (String s) -> 1, Math::addExact, 
                     LinkedHashMap::new));

A non-stream approach using Map::merge :

Map<String, Integer> result = new LinkedHashMap<>();
myList.forEach(s -> result.merge(s.split("_")[0], 1, Math::addExact));

Since you want to count the elements, I'd suggest using Guava 's Multiset interface, which is dedicated to such purpose.

The definition of Multiset from its JavaDoc:

A collection that supports order-independent equality, like Set , but may have duplicate elements. A multiset is also sometimes called a bag .

Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element.

Here are two ways to use it:

1) Without the Stream API:

ImmutableMultiset<String> multiset2 = ImmutableMultiset.copyOf(Lists.transform(
        list, str -> StringUtils.substringBefore(str, "_")
));

2) Using the Stream API:

ImmutableMultiset<String> multiset = list.stream()
        .map(str -> StringUtils.substringBefore(str, "_"))
        .collect(ImmutableMultiset.toImmutableMultiset());

Note that instead of using something like s.split("_")[0] , I used Apache Commons Lang 's StringUtils.substringBefore , which I find much more readable.

You retrieve the counts of the elements using Multiset.count() method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM