简体   繁体   English

Spark Java中多列的聚合

[英]Aggregation of multiple columns in spark Java

I have Columns list priceColumns that are dynamic.我有动态的列列表priceColumns I am trying to aggregate those columns in Dataset,我正在尝试聚合数据集中的这些列,

public Dataset getAgg(RelationalGroupedDataset rlDataset){
Dataset selectedDS=null;
    for(String priceCol :priceColumns){
            selectedDS=rlDataset.agg(expr("sum(cast("+priceCol+" as BIGINT))"));
        }
return selectedDS;
}

The above code is a improper code, What I am trying to do here is, based on each Columns present the aggregation should happen for that Dataset, How can I write a generic code ?上面的代码是一个不正确的代码,我在这里尝试做的是,基于每个列存在,聚合应该为该数据集发生,我该如何编写通用代码? I'm completely stuck here.我完全被困在这里。

I tried with Below way and it solved.我用下面的方式尝试过,它解决了。

List<Column> columnExpr = priceColumns.stream()
                             .map(col->expr("sum(cast("+col+" as BIGINT))").as(col))
                             .collect(Collectors.toList());

Then,然后,

selectedDS= rlDataset
                    .agg(columnExpr.get(0),
                JavaConverters.asScalaIteratorConverter(columnExpr.subList(1, columnExpr.size())
                    .iterator()).asScala().toSeq());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM