简体   繁体   English

Java Streams | groupingBy相同的元素

[英]Java Streams | groupingBy same elements

I have a stream of words and I would like to sort them according to the occurrence of same elements (=words). 我有一个单词流,我想根据相同元素(=单词)的出现对它们进行排序。

eg: {hello, world, hello} 例如:{hello,world,hello}

to

Map<String, List<String>>

hello, {hello, hello} 你好你好你好}

world, {world} 世界,{世界}

What i have so far: 到目前为止我有什么:

Map<Object, List<String>> list = streamofWords.collect(Collectors.groupingBy(???));

Problem 1: The stream seems to lose the information that he is processing Strings, therefore the compiler forces me to change the type to Object, List 问题1:流似乎丢失了他正在处理字符串的信息,因此编译器强制我将类型更改为Object,List

Problem 2: I don't know what to put inside the parentesis to group it by the same occurrence. 问题2:我不知道在胃肠道内放入什么,以同样的方式将其分组。 I know that I am able to process single elements within th lambda-expression but I have no idea how to reach "outside" each element to check for equality. 我知道我能够处理lambda表达式中的单个元素,但我不知道如何到达每个元素的“外部”以检查是否相等。

Thank You 谢谢

To get a Map<String, List<String>> , you just need to tell to the groupingBy collector that you want to group the values by identity, so the function x -> x . 要获取Map<String, List<String>> ,您只需要告诉groupingBy收集器您要按标识对值进行分组,因此函数x -> x

Map<String, List<String>> occurrences = 
     streamOfWords.collect(groupingBy(str -> str));

However this a bit useless, as you see you have the same type of informations two times. 然而,这有点无用,因为你看到你有两次相同类型的信息。 You should look into a Map<String, Long> , where's the value indicates the occurrences of the String in the Stream. 您应该查看Map<String, Long> ,其中值表示Stream中String的出现。

Map<String, Long> occurrences = 
     streamOfWords.collect(groupingBy(str -> str, counting()));

Basically instead of having a groupingBy that return values as List , you use the downstream collector counting() to tell that you want to count the number of times this value appears. 基本上不是使用groupingBy返回值作为List ,而是使用下游收集器counting()来告诉您要计算此值出现的次数。

Your sort requirement should imply that you should have a Map<Long, List<String>> (what if different Strings appear the same number of times?), and as the default toMap collector returns an HashMap , it has no notions of ordering, but you could store the elements in a TreeMap instead. 你的排序要求应该意味着你应该有一个Map<Long, List<String>> (如果不同的字符串出现的次数是多少?),并且由于默认的toMap collector返回一个HashMap ,它没有排序的概念,但您可以将元素存储在TreeMap


I've tried to summarize a bit what I've said in the comments. 我试着总结一下我在评论中所说的内容。

You seems to have troubles with how str -> str can tell whether "hello" or "world" are different. 你似乎对str -> str如何判断“你好”或“世界”是否有所不同感到麻烦。

First of all str -> str is a function, that is, for an input x yields a value f(x). 首先str -> str是一个函数,也就是说,对于输入x,产生一个值f(x)。 For example, f(x) = x + 2 is a function that for any value x returns x + 2 . 例如, f(x) = x + 2是对于任何值x返回x + 2的函数。

Here we are using the identity function, that is f(x) = x . 这里我们使用identity函数,即f(x) = x When you collect the elements from the pipeline in the Map , this function will be called before to obtain the key from the value. 当您从Map收集管道中的元素时,将在调用此函数之前从该值获取键。 So in your example, you have 3 elements for which the identity function yields: 所以在你的例子中,你有3个身份函数产生的元素:

f("hello") = "hello"
f("world") = "world"

So far so good. 到现在为止还挺好。

Now when collect() is called, for every value in the stream you'll apply the function on it and evaluate the result (which will be the key in the Map ). 现在,当调用collect()时,对于流中的每个值,您将在其上应用函数并计算结果(这将是Map的键)。 If a key already exists, we take the currently mapped value and we merge in a List the value we wanted to put (ie the value from which you just applied the function on) with this previous mapped value. 如果一个键已经存在,我们将获取当前映射的值,并在List合并我们想要放置的值(即刚刚应用该函数的值)与此先前的映射值。 That's why you get a Map<String, List<String>> at the end. 这就是你最后得到Map<String, List<String>>的原因。

Let's take another example. 让我们再看一个例子。 Now the stream contains the values "hello", "world" and "hey" and the function that we want to apply to group the elements is str -> str.substring(0, 2) , that is, the function that takes the first two characters of the String. 现在流包含值“hello”,“world”和“hey”,我们想要应用于组合元素的函数是str -> str.substring(0, 2) ,即取得的函数字符串的前两个字符。

Similarly, we have: 同样,我们有:

f("hello") = "he"
f("world") = "wo"
f("hey") = "he"

Here you see that both "hello" and "hey" yields the same key when applying the function and hence they will be grouped in the same List when collecting them, so that the final result is: 在这里,您会看到“hello”和“hey”在应用函数时产生相同的键,因此在收集它们时它们将被分组在同一个List ,因此最终结果为:

"he" -> ["hello", "hey"]
"wo" -> ["world"]

To have an analogy with mathematics, you could have take any non-bijective function, such as x 2 . 要与数学进行类比,你可以采用任何非双射函数,例如x 2 For x = -2 and x = 2 we have that f(x) = 4 . 对于x = -2x = 2我们得到f(x) = 4 So if we grouped integers by this function, -2 and 2 would have been in the same "bag". 因此,如果我们通过此函数对整数进行分组,则-2和2将位于相同的“包”中。

Looking at the source code won't help you to understand what's going on at first. 查看源代码不会帮助您了解最初发生的情况。 It's useful if you want to know how it's implemented under the hood. 如果你想知道它是如何在幕后实现的话,它会很有用。 But try first to think of the concept with a higher level of abstraction and then maybe things will become clearer. 但是首先尝试用更高级别的抽象来思考这个概念,然后事情会变得更加清晰。

Hope it helps! 希望能帮助到你! :) :)

The KeyExtractor you are searching for is the identity function: 您要搜索的KeyExtractor是标识功能:

Map<String, List<String>> list = streamofWords.collect(Collectors.groupingBy(Function.identity()));

EDIT added explanation: 编辑补充说明:

  • Function.identity() retuns a 'Function' with one method that does nothing more than returning the argument it gets. Function.identity()使用一个方法返回一个'Function',它只返回它获得的参数。
  • Collectors.groupingBy(Function<S, K> keyExtractor) provides a collector, which collects all elements of the stream to a Map<K, List<S>> . Collectors.groupingBy(Function<S, K> keyExtractor)提供了一个收集器,它将流的所有元素收集到Map<K, List<S>> It is using the keyExtractor implementation it gets to inspect the stream's objects of type S and deduce a key of type K from them. 它使用keyExtractor实现来检查流的S类对象,并从中推导出类型为K的键。 This key is the map's key used to get (or create) the list in the result map the stream element is added to. 此键是映射的键,用于获取(或创建)添加了流元素的结果映射中的列表。

If you want to group by some fields of an object, not a whole object and you don't want to change your equals and hashCode methods I'd create a class holding a set of keys for grouping purposes: 如果你想按对象的某些字段进行分组,而不是整个对象,并且你不想改变你的equals和hashCode方法,我会创建一个包含一组键的类,用于分组:

import java.util.Arrays;

@Getter
public class MultiKey {

    public MultiKey(Object... keys) {
        this.keys = keys;
    }

    private Object[] keys;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        MultiKey multiKey = (MultiKey) o;
        return Arrays.equals(keys, multiKey.keys);
    }

    @Override
    public int hashCode() {
        return Arrays.hashCode(keys);
    }

}

And the groupingBy itself: groupingBy本身:

Map<MultiKey, List<VhfEventView>> groupedList = list
        .stream()
        .collect(Collectors.groupingBy(
                 e -> new MultiKey(e.getGroupingKey1(), e.getGroupingKey2())));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM