[英]Aggregation of multiple columns in spark Java
I have Columns list priceColumns
that are dynamic.我有动态的列列表priceColumns
。 I am trying to aggregate those columns in Dataset,我正在尝试聚合数据集中的这些列,
public Dataset getAgg(RelationalGroupedDataset rlDataset){
Dataset selectedDS=null;
for(String priceCol :priceColumns){
selectedDS=rlDataset.agg(expr("sum(cast("+priceCol+" as BIGINT))"));
}
return selectedDS;
}
The above code is a improper code, What I am trying to do here is, based on each Columns present the aggregation should happen for that Dataset, How can I write a generic code ?上面的代码是一个不正确的代码,我在这里尝试做的是,基于每个列存在,聚合应该为该数据集发生,我该如何编写通用代码? I'm completely stuck here.我完全被困在这里。
I tried with Below way and it solved.我用下面的方式尝试过,它解决了。
List<Column> columnExpr = priceColumns.stream()
.map(col->expr("sum(cast("+col+" as BIGINT))").as(col))
.collect(Collectors.toList());
Then,然后,
selectedDS= rlDataset
.agg(columnExpr.get(0),
JavaConverters.asScalaIteratorConverter(columnExpr.subList(1, columnExpr.size())
.iterator()).asScala().toSeq());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.