简体   繁体   English

在Spark Window函数中,为什么我们需要在末尾使用drop()

[英]In Spark Window functions, Why we need to use drop() at the end

I'm new to Spark window functions. 我是Spark窗口功能的新手。 I am implementing few examples to learn more about it. 我正在实施一些示例以了解更多信息。 Take a look at below example. 看下面的例子。 It's using drop() with withColumn(). 它与withColumn()一起使用drop()。 I searched a lot on spark docs as well but couldn't able to understand its significance. 我也搜索了很多Spark文档,但无法理解它的重要性。

//Get the top record in each subject with the highest fee
val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc)
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1).drop("rn") //Note: 'input' has my data 
dfTop.show()

Can someone explain the significance of drop()? 有人可以解释drop()的重要性吗? What if I do not use drop()? 如果我不使用drop()怎么办?

Thanks. 谢谢。

Why we need to use drop() at the end 为什么我们需要在末尾使用drop()

We don't. 我们没有。 We do it to remove temporary objects, which no longer carries useful information. 我们这样做是为了删除不再携带有用信息的临时对象。

What if I do not use drop()? 如果我不使用drop()怎么办?

You'll have one more column, full of ones, nothing more, nothing less. 您将再有一个专栏,里面满满的,仅此而已。

drop() is used to drop the column(s) which you don't want any more further, nothing much significance. drop()用于删除不需要的列,没有多大意义。

You can see it yourself simply by: 您可以通过以下方式自己查看:

//Commenting drop()
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1) //.drop("rn") //Note: 'input' has my data 
dfTop.show()

dfTop.drop("rn").show()
//"rn" column is gone

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在火花中如何使用 window 规范与聚合函数 - In spark how to use window spec with aggregate functions 在Spark SQL中使用窗口函数结束记录 - End-dating records using window functions in Spark SQL 为什么我们需要的执行者多于Spark中的机器数量? - Why do we need more executors than number of machines in Spark? Spark窗口函数:在另一行开始日期和结束日期的范围内过滤出具有开始日期和结束日期的行 - Spark Window Functions: Filter out rows with start and end dates within the bounds of another rows start and end dates 如何在 Spark 窗口函数中以降序使用 orderby()? - How to use orderby() with descending order in Spark window functions? 窗户功能/ Scala / Spark 1.6 - Window functions / scala / spark 1.6 为什么我们需要在映射值上显式使用隐式方法? - Why do we need to explicitly use the implicit method on map value? 为什么我们需要在运行Spark SBT应用程序时添加“fork in run:= true”? - Why do we need to add “fork in run := true” when running Spark SBT application? Spark Scala 注册 UDF - 为什么我需要在函数末尾传递下划线 (_) - Spark Scala Register UDF - Why I need to pass underscore (_) at the end of function Spark Scala Window 将结果扩展到最后 - Spark Scala Window extend result until the end
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM