[英]How to do a groupby rank and add it as a column to existing dataframe in spark scala?
Currently this is what I'm doing:目前这是我在做什么:
val new_df= old_df.groupBy("column1").count().withColumnRenamed("count","column1_count")
val new_df_rankings = new_df.withColumn(
"column1_count_rank",
dense_rank()
.over(
Window.orderBy($"column1_count".desc))).select("column1_count","column1_count_rank")
But really all I'm looking to do is add a column to the original df (old_df) called "column1_count_rank" without going through all these intermediate steps and merging back.但实际上我想要做的就是在原始 df (old_df) 中添加一个名为“column1_count_rank”的列,而无需经过所有这些中间步骤并重新合并。
Is there a way to do this?有没有办法做到这一点?
Thanks and have a great day!谢谢,祝你有美好的一天!
As you apply aggregation, there will be a calculative result, it will create new dataframe. Can you give some sample input and output example当你应用聚合时,会有一个计算结果,它将创建新的 dataframe。你能给出一些示例输入和 output 示例吗
old_df.groupBy("column1").agg(count("*").alias("column1_count")).withColumn("column1_count_rank",dense_rank().over(Window.orderBy($"column1_count".desc))).select("column1_count","column1_count_rank")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.