简体   繁体   English

Java 8 Lambda分组同时使用X和Y.

[英]Java 8 Lambda groupingBy X and Y simultaneously

I'm looking for a lambda to refine the data already retrieved. 我正在寻找一个lambda来优化已检索的数据。 I have a raw resultset, if the user do not change the date I want use java's lambda to group by the results for then. 我有一个原始结果集,如果用户没有更改我希望使用java的lambda按结果分组的日期。 And I'm new to lambdas with java. 而且我是java的新手lambdas。

The lambda I'm looking for works simliar to this query. 我正在寻找的lambda与这个查询相似。

select z, w, min(x), max(x), avg(x), min(y), max(y), avg(y) from table group by x, w;

So I'm assuming you have a List of objects and you want to create a map with the given groupings. 所以我假设您有一个对象列表,并且您想要创建一个具有给定分组的地图。 I am a bit confused by your x, y, w, z so I'll use my own fields. 我对你的x,y,w,z有点困惑,所以我会用自己的字段。 But Here's how I would do it: 但这是我将如何做到这一点:

interface Entry {
    String getGroup1();
    String getGroup2();
    int getIntData();
    double getDoubleData();
}

List<Entry> dataList;
Map<String, Map<String, IntSummaryStatistics>> groupedStats = 
    dataList.stream()
        .collect(Collectors.groupingBy(Entry::getGroup1,
            Collectors.groupingBy(Entry::getGroup2,
                Collectors.summarizingInt(Entry::getIntData))));

Then if you want to get, say, the average of data for items with groups A, B then you use: 然后,如果你想获得具有组A,B的项目的平均数据,那么你使用:

groupedStats.get("A").get("B").getAverage();

If you want to summarise more than one set of data simultaneously then it gets a bit more complicated. 如果要同时汇总多个数据集,则会更复杂一些。 You need to write your own wrapper class that can accumulate multiple statistics. 您需要编写自己的包装器类,可以累积多个统计信息。 Here's an example with both data items in Entry (I made them an int and a double to make it a bit more interesting). 这是一个包含Entry中两个数据项的示例(我将它们设为int和double,以使其更有趣)。

class CompoundStats {
    private final IntSummaryStatistics intDataStats = new IntSummaryStatistics();
    private final DoubleSummaryStatistics doubleDataStats = new DoubleSummaryStatistics();

    public void add(Entry entry) {
        intDataStats.accept(entry.getIntData());
        doubleDataStats.accept(entry.getDoubleData());
    }

    public CompoundStats combine(CompoundStats other) {
        intDataStats.combine(other.intDataStats);
        doubleDataStats.combine(other.doubleDataStats);
        return this;
    }
}

This class can then be used to create your own collector: 然后可以使用此类创建自己的收集器:

Map<String, Map<String, CompoundStats>> groupedStats = 
    dataList.stream()
        .collect(Collectors.groupingBy(Entry::getGroup1,
            Collectors.groupingBy(Entry::getGroup2,
                Collector.of(CompoundStats::new, CompoundStats::add, CompoundStats::combine))));

Now your maps return a CompoundStats instead of an IntSummaryStatistics: 现在你的地图返回一个CompoundStats而不是一个IntSummaryStatistics:

groupedStats.get("A").get("B").getDoubleStats().getAverage();

Also note that this would be neater if you created a separate class to hold your groupings rather than using the two step map I've proposed above. 另请注意,如果您创建了一个单独的类来保存您的分组而不是使用我上面提到的两步图,那么这将更整洁。 Again not a difficult modification if required. 如果需要,再次不是一个困难的修改

Hopefully this is useful in your own case. 希望这对你自己的情况很有用。

I'm going to be using the Tuple2 type from jOOλ for this exercise, but you can also create your own tuple type if you want to avoid the dependency. 我将在本练习中使用Tuple2类型 ,但是如果你想避免依赖,你也可以创建自己的元组类型。

I'm also assuming you're using this to represent your data: 我还假设您使用它来表示您的数据:

class A {
    final int w;
    final int x;
    final int y;
    final int z;

    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

You can now write: 你现在可以写:

Map<Tuple2<Integer, Integer>, Tuple2<IntSummaryStatistics, IntSummaryStatistics>> map =
Stream.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(

    // This is your GROUP BY criteria
    a -> tuple(a.z, a.w),
    Collector.of(

        // When collecting, we'll aggregate data into two IntSummaryStatistics
        // for x and y
        () -> tuple(new IntSummaryStatistics(), new IntSummaryStatistics()),

        // The accumulator will simply take new t = (x, y) values
        (r, t) -> {
            r.v1.accept(t.x);
            r.v2.accept(t.y);
        },

        // The combiner will merge two partial aggregations,
        // in case this is executed in parallel
        (r1, r2) -> {
            r1.v1.combine(r2.v1);
            r1.v2.combine(r2.v2);

            return r1;
        }
    )
));

Or even better (using the latest jOOλ API): 甚至更好(使用最新的jOOλAPI):

Map<Tuple2<Integer, Integer>, Tuple2<IntSummaryStatistics, IntSummaryStatistics>> map =

// Seq is like a Stream, but sequential only, and with more features
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))

// Seq.groupBy() is just short for Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),

    // Because once you have tuples, why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
);

The map structure is now: 地图结构现在是:

(z, w) -> (all_aggregations_of(x), all_aggregations_of(y))

Calling toString() on the above map will produce: 在上面的地图上调用toString()将产生:

{
    (1, 1) = (IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2}, 
              IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}), 
    (4, 9) = (IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9}, 
              IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7}), 
    (5, 2) = (IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5}, 
              IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})
}

You got all your statistics now. 你现在收到了所有的统计数据。

Side note 边注

Of course, I don't know your exact requirements, but I suspect you'll be quickly needing more sophisticated aggregations in your report, such as medians, inverse distribution, and all sorts of nice OLAP features, which is when you realise that SQL is just a much easier language for this kind of task. 当然,我不知道您的确切要求,但我怀疑您将很快需要在报表中进行更复杂的聚合,例如中位数,逆分布以及各种不错的OLAP功能,这就是当您意识到SQL时对于这种任务来说,这只是一种更容易的语言。

On the other hand, we'll definitely add more SQLesque features to jOOλ . 另一方面,我们肯定会向jOOλ添加更多SQLesque功能 This topic has also inspired me to write a full blog post with more details about the described approach . 这个主题也激发了我写一篇完整的博客文章,其中详细介绍了所描述的方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM