简体   繁体   English

等效于Scala中的Java summaryStatistics

[英]Equivalent to Java summaryStatistics in Scala

I am porting some Java code to Scala and need to extract some really basic statistical values, which include the count , maximum , minimum and average from a stream of long values. 我正在将一些Java代码移植到Scala,需要从一些长值流中提取一些真正的基本统计值,包括countmaximumminimumaverage

In Java I have solved this problem with this method: 在Java中,我已通过以下方法解决了此问题:

public static Stats calcStats(Iterable<Ad> iterable) {
    LongSummaryStatistics longSummaryStatistics = StreamSupport.stream(iterable.spliterator(), false).mapToLong(Ad::getEvent_time).summaryStatistics();
    return new Stats(longSummaryStatistics.getMin(), longSummaryStatistics.getMax(), round(longSummaryStatistics.getAverage()),
            longSummaryStatistics.getCount());
}

Is there a similar method to extract these values in one go in the Scala libraries (without using extra libraries like Spark)? 是否有类似的方法可以一次性在Scala库中提取这些值(而不使用Spark等额外的库)?

Right now I am using some code similar to this one: 现在,我正在使用与此类似的一些代码:

def main(args: Array[String]): Unit = {
  val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L))
  val stats = summaryStatistics(l.iterator)
  println("min: %d, max: %d, avg: %f".format(stats._1, stats._2, stats._3))
}

def summaryStatistics(iter: Iterator[(String, Long)]): (Long, Long, Double) = {
  val stats = iter.map((tuple: (String, Long)) => tuple._2)
    .foldLeft((Long.MaxValue, Long.MinValue, 0L, 0L))((a, t) => (Math.min(t, a._1), Math.max(t, a._2), a._3 + 1, a._4 + t))
  (stats._1, stats._2, stats._4 / (stats._3 * 1.0))
}

This prints out: 打印输出:

min: 1, max: 4, avg: 2.500000

You can use the java lib directly, by going through the java world just a bit :) 您可以通过遍历Java世界直接使用Java库:)

import scala.collection.JavaConverters._

def main(args: Array[String]): Unit = {
    val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L))
    val stats = StreamSupport.stream(l.asJava.spliterator(), false).mapToLong(x => x._2).summaryStatistics()
    println("min: %d, max: %d, avg: %f".format(stats.getMin, stats.getMax, stats.getAverage))
}

Note the import of the JavaConverters, and the little "asJava" added in the code to match the StreamSupport API :) 请注意JavaConverters的导入,并在代码中添加了小写的“ asJava”以匹配StreamSupport API :)

Alternatively to C4stor, you can use more Scala collections like this: 除了C4stor,您可以使用更多这样的Scala集合:

import java.util.LongSummaryStatistics

def main(): Unit = {
  val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L))
  // .view here is a trick to make it semantically more similar to Java Streams i.e. to avoid materializaiton of the mapped list
  val stats = summaryStatistics(l.view.map(_._2))
  println("min: %d, max: %d, avg: %f".format(stats.getMin, stats.getMax, stats.getAverage))
}


def summaryStatistics(col: TraversableOnce[Long]): LongSummaryStatistics = {
  col.foldLeft(new LongSummaryStatistics)((stat, el) => {
    stat.accept(el)
    stat
  })
}

Or if you want to use a potential of parallel support that is implemented in LongSummaryStatistics , you may use aggregate instead of foldLeft such as: 或者,如果您想使用LongSummaryStatistics实现的潜在并行支持,则可以使用aggregate而不是foldLeft例如:

def summaryStatistics(col: TraversableOnce[Long]): LongSummaryStatistics = {
  col.aggregate(new LongSummaryStatistics)((stat, el) => {
    stat.accept(el)
    stat
  }, (s1, s2) => {
    s1.combine(s2)
    s1
  })
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM