[英]Can I check if Java 8 stream contains at least n elements
I have a Java 8 stream from which I want to (uniformly) randomly select an element. 我有一个Java 8流,我想(统一)随机选择一个元素。 The stream can contain anywhere from zero to tens of thousands of elements. 流可以包含从零到数万个元素的任何位置。
I have implemented an algorithm that selects one using a MapReduce-like pattern, but for the very small streams it would probably be more efficient to just collect the items into a List and return one with a random index. 我已经实现了一个使用类似MapReduce的模式选择一个算法的算法,但是对于非常小的流,将项目收集到List中并返回一个随机索引可能更有效。 For that I have to count them, however. 但是,我必须计算它们。 Streams do have a count() method but that counts them all, I'm not really interested in the actual count, all I care about is whether it contains more than a to-be-determined number. Streams确实有一个count()方法但是它们全部计算,我对实际计数并不感兴趣,我关心的是它是否包含多个待定数字。 Does anyone know if such a method exists? 有谁知道这种方法是否存在? I can't find it but there might be something I'm overlooking or some clever trick for finding it anyway. 我找不到它,但可能有一些我忽略的东西,或者一些聪明的伎俩,无论如何找到它。
PS: I'm aware that sometimes it's not necessary to optimize code; PS:我知道有时候没有必要优化代码; but I would like to try it nonetheless, just for the experience. 但我想尝试一下,只是为了体验。 I'm a student. 我是学生。
PPS: I've copied my algorithm here, in case anyone's interested (or wants to look for bugs, I haven't tested it yet ;-) PPS:我在这里复制了我的算法,万一有人感兴趣(或者想找bug,我还没有测试过;-)
stream
.parallel()
.map(t -> new Pair<T, Integer>(t, 1))
.reduce((Pair<T, Integer> t, Pair<T, Integer> u) -> {
if (rand.nextDouble() <= (t.getValue1() / (double) (t.getValue1() + u.getValue1()))) {
return new Pair<>(t.getValue0(), t.getValue1() + u.getValue1());
} else {
return new Pair<>(u.getValue0(), t.getValue1() + u.getValue1());
}
})
.map(t -> t.getValue0());
(The pairs are from org.javatuples, now that Java supports functional programming-like interfaces the lack of tuples does become a bit painful). (这些对来自org.javatuples,现在Java支持类似函数编程的接口,缺少元组确实会变得有点痛苦)。
Your code does not return element from uniform distribution. 您的代码不会返回统一分布中的元素。 It depends on the order, that stream provides elements to reduce method. 它取决于顺序,该流提供了减少方法的元素。 In general case you can't consider that the order will not be the special one. 一般情况下,您不能认为订单不是特殊订单。 Solving your task: if you have enough memory, it is possible to write RandomComparator (that saves previous results in Map), sort your stream with this comparator and get first element (don't use findAny). 解决您的任务:如果您有足够的内存,可以编写RandomComparator(将以前的结果保存在Map中),使用此比较器对流进行排序并获取第一个元素(不要使用findAny)。 If stream is too large, it is possible to sample it with RandomFilter. 如果stream太大,可以使用RandomFilter对其进行采样。
btw, if you have SIZED flag in your stream, task is trivial. 顺便说一句,如果您的流中有SIZED标志,则任务很简单。 Just get size, generate random index and make spip :) 只需获取大小,生成随机索引并制作spip :)
I suggest trying go get this info from the source of data for the stream. 我建议尝试从流的数据源获取此信息。 Where do you get the data for the stream from? 您从哪里获取流的数据? If the source (some collection for example) can give you the number of elements you're set. 如果源(例如某些集合)可以为您提供所设置的元素数。 If it's some producer function check what it does and whether it's possible to estimate the size upfront. 如果它是一些生产者功能检查它做了什么以及是否可以预先估计大小。
The moment I type "stream" I normally start thinking of a "recipe" of what do I want to do with this data, rather than the actual data. 在我键入“stream”的那一刻,我通常会开始考虑我想要对这些数据做什么的“配方”,而不是实际的数据。 I think that's close to the way streams are designed (which tells why they don't provide way to count elements). 我认为这与流的设计方式很接近(这说明了为什么它们没有提供计算元素的方法)。
Best regards, Dido 最好的问候,Dido
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.