[英]Complex custom Collector with Java 8
I have a stream of objects which I would like to collect the following way.我有一个对象流,我想通过以下方式收集它们。
Let's say we are handling forum posts :假设我们正在处理论坛帖子:
class Post {
private Date time;
private Data data
}
I want to create a list which groups posts by a period.我想创建一个按时间段对帖子进行分组的列表。 If there were no posts for
X
minutes, create a new group .如果
X
分钟内没有帖子,请创建一个新组。
class PostsGroup{
List<Post> posts = new ArrayList<> ();
}
I want to get a List<PostGroups>
containing the posts grouped by the interval.我想要一个
List<PostGroups>
包含按时间间隔分组的帖子。
Example: interval of 10
minutes.示例:间隔
10
分钟。
Posts:帖子:
[{time:x, data:{}}, {time:x + 3, data:{}} , {time:x + 12, data:{}, {time:x + 45, data:{}}}]
I want to get a list of posts group :我想获取帖子组列表:
[
{posts : [{time:x, data:{}}, {time:x + 3, data:{}}, {time:x + 12, data:{}]]},
{posts : [{time:x + 45, data:{}]}
]
X + 22
.X + 22
。 Then a new post was received at X + 45
.X + 45
收到了一个新帖子。 Is this possible?这可能吗?
This problem could be easily solved using the groupRuns
method of my StreamEx library:使用我的StreamEx库的
groupRuns
方法可以轻松解决此问题:
long MAX_INTERVAL = TimeUnit.MINUTES.toMillis(10);
StreamEx.of(posts)
.groupRuns((p1, p2) -> p2.time.getTime() - p1.time.getTime() <= MAX_INTERVAL)
.map(PostsGroup::new)
.toList();
I assume that you have a constructor我假设你有一个构造函数
class PostsGroup {
private List<Post> posts;
public PostsGroup(List<Post> posts) {
this.posts = posts;
}
}
The StreamEx.groupRuns
method takes a BiPredicate
which is applied to two adjacent input elements and returns true if they must be grouped together. StreamEx.groupRuns
方法采用BiPredicate
应用于两个相邻的输入元素,如果它们必须组合在一起,则返回 true。 This method creates the stream of lists where each list represents the group.此方法创建列表流,其中每个列表代表组。 This method is lazy and works fine with parallel streams.
此方法是惰性的,并且适用于并行流。
You need to retain state between stream entries and write yourself a grouping classifier.您需要保留流条目之间的状态并为自己编写一个分组分类器。 Something like this would be a good start.
像这样的事情将是一个好的开始。
class Post {
private final long time;
private final String data;
public Post(long time, String data) {
this.time = time;
this.data = data;
}
@Override
public String toString() {
return "Post{" + "time=" + time + ", data=" + data + '}';
}
}
public void test() {
System.out.println("Hello");
long t = 0;
List<Post> posts = Arrays.asList(
new Post(t, "One"),
new Post(t + 1000, "Two"),
new Post(t + 10000, "Three")
);
// Group every 5 seconds.
Map<Long, List<Post>> gouped = posts
.stream()
.collect(Collectors.groupingBy(new ClassifyByTimeBetween(5000)));
gouped.entrySet().stream().forEach((e) -> {
System.out.println(e.getKey() + " -> " + e.getValue());
});
}
class ClassifyByTimeBetween implements Function<Post, Long> {
final long delay;
long currentGroupBy = -1;
long lastDateSeen = -1;
public ClassifyByTimeBetween(long delay) {
this.delay = delay;
}
@Override
public Long apply(Post p) {
if (lastDateSeen >= 0) {
if (p.time > lastDateSeen + delay) {
// Grab this one.
currentGroupBy = p.time;
}
} else {
// First time - start there.
currentGroupBy = p.time;
}
lastDateSeen = p.time;
return currentGroupBy;
}
}
Since no one has provided a solution with a custom collector as it was required in the original problem statement, here is a collector-implementation that groups Post
objects based on the provided time-interval.由于没有人提供原始问题陈述中要求的自定义收集器的解决方案,因此这里是一个收集器实现,它根据提供的时间间隔对
Post
对象进行分组。
Date
class mentioned in the question is obsolete since Java 8 and not recommended to be used in new projects.问题中提到的
Date
类自 Java 8 以来已过时,不建议在新项目中使用。 Hence, LocalDateTime
will be utilized instead.因此,将改为使用
LocalDateTime
。
For testing purposes, I've used Post
implemented as a Java 16 record ( if you substitute it with a class, the overall solution will be fully compliant with Java 8 ):出于测试目的,我使用
Post
实现为 Java 16记录(如果将其替换为类,则整体解决方案将完全符合 Java 8 ):
public record Post(LocalDateTime dateTime) {}
Also, I've enhanced the PostGroup
object.此外,我还增强了
PostGroup
对象。 My idea is that it should be capable to decide whether the offered Post
should be added to the list of posts or rejected as the Information expert principle suggests ( in short: all manipulations with the data should happen only inside a class to which that data belongs ).我的想法是,它应该能够决定是否应该将提供的
Post
添加到帖子列表中,或者按照信息专家原则的建议被拒绝(简而言之:对数据的所有操作都应该只发生在该数据所属的类中)。
To facilitate this functionality, two extra fields were added: interval
of type Duration
from the java.time
package to represent the maximum interval between the earliest post and the latest post in a group , and intervalBound
of type LocalDateTime
which gets initialized after the first post will be added a later on will be used internally by the method isWithinInterval()
to check whether the offered post fits into the interval .为了促进此功能,添加了两个额外字段:
java.time
包中的Duration
类型的interval
,表示组中最早帖子和最新帖子之间的最大间隔,以及LocalDateTime
类型的intervalBound
,它在第一次发布后初始化稍后将被添加,将由isWithinInterval()
方法在内部使用,以检查提供的帖子是否适合interval 。
public class PostsGroup {
private Duration interval;
private LocalDateTime intervalBound;
private List<Post> posts = new ArrayList<>();
public PostsGroup(Duration interval) {
this.interval = interval;
}
public boolean tryAdd(Post post) {
if (posts.isEmpty()) {
intervalBound = post.dateTime().plus(interval);
return posts.add(post);
} else if (isWithinInterval(post)) {
return posts.add(post);
}
return false;
}
public boolean isWithinInterval(Post post) {
return post.dateTime().isBefore(intervalBound);
}
@Override
public String toString() {
return "PostsGroup{" + posts + '}';
}
}
I'm making two assumptions:我做了两个假设:
sorted()
operation in the pipeline before collecting the results);sorted()
操作);We can create a custom collector either inline by using one of the versions of the static method Collector.of()
or by defining a class
that implements the Collector
interface.我们可以通过使用静态方法
Collector.of()
的一个版本或通过定义实现Collector
接口的class
来内联创建自定义收集器。
These parameters have to be provided while creating a custom collector :创建自定义收集器时必须提供这些参数:
Supplier Supplier<A>
is meant to provide a mutable container which store elements of the stream.供应商
Supplier<A>
旨在提供一个可变容器来存储流的元素。 In this case, ArrayDeque
(as an implementation of the Deque
interface) will be handy as a container to facilitate the convenient access to the most recently added element, ie the latest PostGroup
.在这种情况下,
ArrayDeque
(作为Deque
接口的实现)将作为容器方便地访问最近添加的元素,即最新的PostGroup
。
Accumulator BiConsumer<A,T>
defines how to add elements into the container provided by the supplier .累加器
BiConsumer<A,T>
定义如何将元素添加到供应商提供的容器中。 For this task, we need to provide the logic on that will allow determining whether the next element from the stream (ie the next Post
) should go into the last PostGroup
in the Deque
, or a new PostGroup
needs to be allocated for it.对于这个任务,我们需要提供逻辑来确定流中的下一个元素(即下一个
Post
)是否应该进入Deque
中的最后一个PostGroup
,或者需要为其分配一个新的PostGroup
。
Combiner BinaryOperator<A> combiner()
establishes a rule on how to merge two containers obtained while executing stream in parallel. Combiner
BinaryOperator<A> combiner()
建立了一个规则,用于合并并行执行流时获得的两个容器。 Since this operation is treated as not parallelizable, the combiner is implemented to throw an AssertionError
in case of parallel execution.由于此操作被视为不可并行化,因此组合器被实现为在并行执行的情况下抛出
AssertionError
。
Finisher Function<A,R>
is meant to produce the final result by transforming the mutable container. Finisher
Function<A,R>
旨在通过转换可变容器来产生最终结果。 The finisher function in the code below turns the container , a deque containing the result, into an immutable list .下面代码中的finisher函数将容器(包含结果的双端队列)转换为不可变列表。
Note: Java 16 method toList()
is used inside the finisher function, for Java 8 it can be replaced with collect(Collectors.toUnmodifiableList())
or collect(Collectors.toList())
.注意: Java 16 的
toList()
方法在Finisher函数中使用,对于 Java 8,它可以替换为collect(Collectors.toUnmodifiableList())
或collect(Collectors.toList())
。
Collector.Characteristics.UNORDERED
which is used in this case denotes that the order in which partial results of the reduction produced while executing in parallel is not significant.Collector.Characteristics.UNORDERED
表示并行执行时产生的部分归约结果的顺序并不重要。 In this case, collector doesn't require any characteristics. The method below is responsible for generating the collector based on the provided interval .下面的方法负责根据提供的时间间隔生成收集器。
public static Collector<Post, ?, List<PostsGroup>> groupPostsByInterval(Duration interval) {
return Collector.of(
ArrayDeque::new,
(Deque<PostsGroup> deque, Post post) -> {
if (deque.isEmpty() || !deque.getLast().tryAdd(post)) { // if no groups have been created yet or if adding the post into the most recent group fails
PostsGroup postsGroup = new PostsGroup(interval);
postsGroup.tryAdd(post);
deque.addLast(postsGroup);
}
},
(Deque<PostsGroup> left, Deque<PostsGroup> right) -> { throw new AssertionError("should not be used in parallel"); },
(Deque<PostsGroup> deque) -> deque.stream().collect(Collectors.collectingAndThen(Collectors.toUnmodifiableList())));
}
main()
- demo main()
- 演示
public static void main(String[] args) {
List<Post> posts =
List.of(new Post(LocalDateTime.of(2022,4,28,15,0)),
new Post(LocalDateTime.of(2022,4,28,15,3)),
new Post(LocalDateTime.of(2022,4,28,15,5)),
new Post(LocalDateTime.of(2022,4,28,15,8)),
new Post(LocalDateTime.of(2022,4,28,15,12)),
new Post(LocalDateTime.of(2022,4,28,15,15)),
new Post(LocalDateTime.of(2022,4,28,15,18)),
new Post(LocalDateTime.of(2022,4,28,15,27)),
new Post(LocalDateTime.of(2022,4,28,15,48)),
new Post(LocalDateTime.of(2022,4,28,15,54)));
Duration interval = Duration.ofMinutes(10);
List<PostsGroup> postsGroups = posts.stream()
.collect(groupPostsByInterval(interval));
postsGroups.forEach(System.out::println);
}
Output:输出:
PostsGroup{[Post[dateTime=2022-04-28T15:00], Post[dateTime=2022-04-28T15:03], Post[dateTime=2022-04-28T15:05], Post[dateTime=2022-04-28T15:08]]}
PostsGroup{[Post[dateTime=2022-04-28T15:12], Post[dateTime=2022-04-28T15:15], Post[dateTime=2022-04-28T15:18]]}
PostsGroup{[Post[dateTime=2022-04-28T15:27]]}
PostsGroup{[Post[dateTime=2022-04-28T15:48], Post[dateTime=2022-04-28T15:54]]}
You can also play around with this Online Demo你也可以玩这个在线演示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.