简体   繁体   English

Java Streams - 标准偏差

[英]Java Streams - Standard Deviation

I wish to clarify upfront I am looking for a way to calculate Standard deviation using Streams (I have a working method at present which calculates & returns SD but without using Streams). 我想提前澄清我正在寻找一种使用Streams计算标准偏差的方法(我目前有一种计算和返回SD但不使用Streams的工作方法)。

The dataset i am working with matches closely as seen in Link . 我正在使用的数据集与Link中的匹配密切相关。 As shown in this link am able to group my data & get the average but not able to figure out how to get the SD. 如此链接所示,我能够对数据进行分组并获得平均值,但无法确定如何获得SD。

Code

outPut.stream()
            .collect(Collectors.groupingBy(e -> e.getCar(),
                    Collectors.averagingDouble(e -> (e.getHigh() - e.getLow()))))
            .forEach((car,avgHLDifference) -> System.out.println(car+ "\t" + avgHLDifference));

I also checked Link on DoubleSummaryStatistics but it doesn't seem to help for SD. 我还检查了DoubleSummaryStatistics上的Link ,但它似乎对SD没有帮助。

You can use a custom collector for this task that calculates a sum of square. 您可以使用自定义收集器执行此任务,以计算平方和。 The buit-in DoubleSummaryStatistics collector does not keep track of it. DoubleSummaryStatistics -in DoubleSummaryStatistics收集器不会跟踪它。 This was discussed by the expert group in this thread but finally not implemented. 专家组在这个帖子中对此进行了讨论,但最终没有实现。 The difficulty when calculating the sum of squares is the potential overflow when squaring the intermediate results. 计算平方和时的困难是平方中间结果时的潜在溢出。

static class DoubleStatistics extends DoubleSummaryStatistics {

    private double sumOfSquare = 0.0d;
    private double sumOfSquareCompensation; // Low order bits of sum
    private double simpleSumOfSquare; // Used to compute right sum for non-finite inputs

    @Override
    public void accept(double value) {
        super.accept(value);
        double squareValue = value * value;
        simpleSumOfSquare += squareValue;
        sumOfSquareWithCompensation(squareValue);
    }

    public DoubleStatistics combine(DoubleStatistics other) {
        super.combine(other);
        simpleSumOfSquare += other.simpleSumOfSquare;
        sumOfSquareWithCompensation(other.sumOfSquare);
        sumOfSquareWithCompensation(other.sumOfSquareCompensation);
        return this;
    }

    private void sumOfSquareWithCompensation(double value) {
        double tmp = value - sumOfSquareCompensation;
        double velvel = sumOfSquare + tmp; // Little wolf of rounding error
        sumOfSquareCompensation = (velvel - sumOfSquare) - tmp;
        sumOfSquare = velvel;
    }

    public double getSumOfSquare() {
        double tmp =  sumOfSquare + sumOfSquareCompensation;
        if (Double.isNaN(tmp) && Double.isInfinite(simpleSumOfSquare)) {
            return simpleSumOfSquare;
        }
        return tmp;
    }

    public final double getStandardDeviation() {
        return getCount() > 0 ? Math.sqrt((getSumOfSquare() / getCount()) - Math.pow(getAverage(), 2)) : 0.0d;
    }

}

Then, you can use this class with 然后,您可以使用此类

Map<String, Double> standardDeviationMap =
    list.stream()
        .collect(Collectors.groupingBy(
            e -> e.getCar(),
            Collectors.mapping(
                e -> e.getHigh() - e.getLow(),
                Collector.of(
                    DoubleStatistics::new,
                    DoubleStatistics::accept,
                    DoubleStatistics::combine,
                    d -> d.getStandardDeviation()
                )
            )
        ));

This will collect the input list into a map where the values corresponds to the standard deviation of high - low for the same key. 这会将输入列表收集到一个映射中,其中值对应于同一密钥的high - low标准偏差。

You can use this custom Collector : 您可以使用此自定义收集器:

private static final Collector<Double, double[], Double> VARIANCE_COLLECTOR = Collector.of( // See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
        () -> new double[3], // {count, mean, M2}
        (acu, d) -> { // See chapter about Welford's online algorithm and https://math.stackexchange.com/questions/198336/how-to-calculate-standard-deviation-with-streaming-inputs
            acu[0]++; // Count
            double delta = d - acu[1];
            acu[1] += delta / acu[0]; // Mean
            acu[2] += delta * (d - acu[1]); // M2
        },
        (acuA, acuB) -> { // See chapter about "Parallel algorithm" : only called if stream is parallel ...
            double delta = acuB[1] - acuA[1];
            double count = acuA[0] + acuB[0];
            acuA[2] = acuA[2] + acuB[2] + delta * delta * acuA[0] * acuB[0] / count; // M2
            acuA[1] += delta * acuB[0] / count;  // Mean
            acuA[0] = count; // Count
            return acuA;
        },
        acu -> acu[2] / (acu[0] - 1.0), // Var = M2 / (count - 1)
        UNORDERED);

Then simply call this collector on your stream : 然后只需在您的流上调用此收集器:

double stdDev = Math.sqrt(outPut.stream().boxed().collect(VARIANCE_COLLECTOR));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM