简体   繁体   English

如何进行 groupby 排名并将其作为列添加到 spark scala 中的现有 dataframe?

[英]How to do a groupby rank and add it as a column to existing dataframe in spark scala?

Currently this is what I'm doing:目前这是我在做什么:

  val new_df= old_df.groupBy("column1").count().withColumnRenamed("count","column1_count")

  val new_df_rankings = new_df.withColumn(
    "column1_count_rank",
    dense_rank()
      .over(
        Window.orderBy($"column1_count".desc))).select("column1_count","column1_count_rank")

But really all I'm looking to do is add a column to the original df (old_df) called "column1_count_rank" without going through all these intermediate steps and merging back.但实际上我想要做的就是在原始 df (old_df) 中添加一个名为“column1_count_rank”的列,而无需经过所有这些中间步骤并重新合并。

Is there a way to do this?有没有办法做到这一点?

Thanks and have a great day!谢谢,祝你有美好的一天!

As you apply aggregation, there will be a calculative result, it will create new dataframe. Can you give some sample input and output example当你应用聚合时,会有一个计算结果,它将创建新的 dataframe。你能给出一些示例输入和 output 示例吗

old_df.groupBy("column1").agg(count("*").alias("column1_count")).withColumn("column1_count_rank",dense_rank().over(Window.orderBy($"column1_count".desc))).select("column1_count","column1_count_rank")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM