简体   繁体   English

用户定义的聚合函数 Spark Java - 合并问题

[英]User Defined Aggregate Functions Spark Java - merge problem

I'm trying to use a User Defined Aggregate Function following the documentation here , I want to first pass 2 values, x and y, to a SimpleRegression then merge the simpleRegression by doing append.我正在尝试按照此处的文档使用用户定义的聚合 Function,我想首先将 2 个值 x 和 y 传递给SimpleRegression ,然后通过执行 append 合并 simpleRegression。 My problem is that the reduce function receives the values correctly (if I ask it to print x and y it prints them correctly), however if I see how many values have been added to the regressors in the merge function (using the getN() method as shown in the code provided) it returns me that no value has been added to those regressors, that is, as if no addData() was performed on them, why does this happen?我的问题是reduce function 正确接收值(如果我要求它打印 x 和 y 它会正确打印它们),但是如果我看到在merge function 中向回归器添加了多少值(使用getN()方法如提供的代码所示)它返回我没有向这些回归器添加任何值,也就是说,好像没有对它们执行addData() ,为什么会发生这种情况? What am I doing wrong?我究竟做错了什么?

Clearly this does not allow me to do what I want: to obtain the slope and the intercept of each regression line, because when the finish function is executed the regressors are empty and therefore slope and intercept are set to NaN.显然,这不允许我做我想做的事情:获得每条回归线的斜率和截距,因为当finish function 执行时,回归量为空,因此斜率和截距设置为 NaN。

This is my code (Java):这是我的代码(Java):

public static class RegressorAggregator extends Aggregator<Tuple2<Long, Long>, SimpleRegressionWrapper, LineParameters> {


    //Valore zero per l'aggregazione - dovrebbe soddisfare a+zero=a;
    public SimpleRegressionWrapper zero(){
        return new SimpleRegressionWrapper();
    }

    public SimpleRegressionWrapper reduce(SimpleRegressionWrapper simpleRegression, Tuple2<Long, Long> xy){
        double x = (double)xy._1;
        double y = (double)xy._2;
        simpleRegression.addData(x,y);
        return simpleRegression;
    }

    public SimpleRegressionWrapper merge(SimpleRegressionWrapper a, SimpleRegressionWrapper b){
        Logger log = LogManager.getLogger(getClass().getSimpleName());
        log.error(a.getN() + " " + b.getN());
        a.append(b);
        return a;
    }

    public LineParameters finish(SimpleRegressionWrapper simpleRegression){
        return new LineParameters(simpleRegression.getSlope(), simpleRegression.getIntercept());
    }

    public Encoder<SimpleRegressionWrapper> bufferEncoder(){
        return Encoders.bean(SimpleRegressionWrapper.class);
    }

    public Encoder<LineParameters> outputEncoder(){
        return Encoders.bean(LineParameters.class);
    }

}

The problem can be resolved changing this:改变这个问题可以解决:

public Encoder<SimpleRegressionWrapper> bufferEncoder(){
        return Encoders.bean(SimpleRegressionWrapper.class);
    }

into this:进入这个:

public Encoder<SimpleRegressionWrapper> bufferEncoder(){
        return Encoders.javaSerialization(SimpleRegressionWrapper.class);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM