在Spark Window函数中，为什么我们需要在末尾使用drop（）

Question

I'm new to Spark window functions. 我是Spark窗口功能的新手。 I am implementing few examples to learn more about it. 我正在实施一些示例以了解更多信息。 Take a look at below example. 看下面的例子。 It's using drop() with withColumn(). 它与withColumn（）一起使用drop（）。 I searched a lot on spark docs as well but couldn't able to understand its significance. 我也搜索了很多Spark文档，但无法理解它的重要性。

//Get the top record in each subject with the highest fee
val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc)
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1).drop("rn") //Note: 'input' has my data 
dfTop.show()

Can someone explain the significance of drop()? 有人可以解释drop（）的重要性吗？ What if I do not use drop()? 如果我不使用drop（）怎么办？

Thanks. 谢谢。

Answer 1

Why we need to use drop() at the end 为什么我们需要在末尾使用drop（）

We don't. 我们没有。 We do it to remove temporary objects, which no longer carries useful information. 我们这样做是为了删除不再携带有用信息的临时对象。

What if I do not use drop()? 如果我不使用drop（）怎么办？

You'll have one more column, full of ones, nothing more, nothing less. 您将再有一个专栏，里面满满的，仅此而已。

Answer 2

drop() is used to drop the column(s) which you don't want any more further, nothing much significance. drop（）用于删除不需要的列，没有多大意义。

You can see it yourself simply by: 您可以通过以下方式自己查看：

//Commenting drop()
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1) //.drop("rn") //Note: 'input' has my data 
dfTop.show()

dfTop.drop("rn").show()
//"rn" column is gone

在Spark Window函数中，为什么我们需要在末尾使用drop（）

问题描述

2 个解决方案

解决方案1
2 已采纳

解决方案2
1 2018-05-24 18:43:10

在Spark Window函数中，为什么我们需要在末尾使用drop（）

问题描述

2 个解决方案

解决方案1 2 已采纳

解决方案2 1 2018-05-24 18:43:10

解决方案1
2 已采纳

解决方案2
1 2018-05-24 18:43:10