简体   繁体   English

Spark Scala 有条件地添加到 agg

[英]Spark Scala Conditionally add to agg

Is it possible to add an aggregate conditionally in Spark Scala?是否可以在 Spark Scala 中有条件地添加聚合?

I would like to DRY out the following code by conditionally adding collect_set我想通过有条件地添加collect_set来干掉以下代码

Example:例子:

    val aggDf = if (addId) groups.agg(
            count(lit(1)).as("Count"),
            percentile_approx($"waitTime",lit(0.5), lit(10000)),
            collect_set("Id").as("Ids")
        )
    else groups.agg(
            count(lit(1)).as("Count"),
            percentile_approx($"waitTime",lit(0.5), lit(10000))
        )

Maybe the is a better way of writing the whole code.也许这是编写整个代码的更好方法。

Thanks.谢谢。

You can store the aggreate columns in a sequence and alter the sequence as required:您可以按顺序存储聚合列并根据需要更改顺序:

var aggCols = Seq(count(lit(1)).as("Count"),
  percentile_approx($"waitTime",lit(0.5), lit(10000)))
if(addId) aggCols = aggCols :+ collect_set("Id").as("Ids")

val aggDf = groups.agg(aggCols.head, aggCols.tail:_*)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala Spark groupBy/Agg 函数 - Scala Spark groupBy/Agg functions Spark scala - 有条件地从其他列添加 json 列 - Spark scala - add a json column from other columns conditionally Spark Scala 数据集无法使用 agg function - Spark Scala Dataset cannot use agg function Spark有条件地在Scala中合并2个数据框 - Spark conditionally merging 2 dataframes in Scala 使用GroupBy Pivot Agg重塑Scala Spark中的DataFrame - Reshaping DataFrame in Scala Spark using GroupBy Pivot Agg 为什么 spark (scala API) agg function 需要 expr 和 exprs arguments? - Why spark (scala API) agg function takes expr and exprs arguments? 如何在 Scala SPARK 中的 groupBy 之后在 agg() function 中找到分位数 - How to find quantiles inside agg() function after groupBy in Scala SPARK sqlDataframe 中的列总和不使用 Scala/spark 中的 groupBy 或 agg 函数 - Sum of column in sqlDataframe without using groupBy or agg functions in scala/spark Scala-Spark使用参数值动态调用groupby和agg - Scala-Spark Dynamically call groupby and agg with parameter values Spark Scala - 通过有条件检查将新列添加到数据帧/数据<n>其他列数</n> - Spark Scala - add new column to dataframe/data by conditionally checking <N> number of other coliumns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM