[英]In Spark Window functions, Why we need to use drop() at the end
I'm new to Spark window functions. 我是Spark窗口功能的新手。 I am implementing few examples to learn more about it.
我正在实施一些示例以了解更多信息。 Take a look at below example.
看下面的例子。 It's using drop() with withColumn().
它与withColumn()一起使用drop()。 I searched a lot on spark docs as well but couldn't able to understand its significance.
我也搜索了很多Spark文档,但无法理解它的重要性。
//Get the top record in each subject with the highest fee
val wSpec = Window.partitionBy($"Subject").orderBy($"Fee".desc)
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1).drop("rn") //Note: 'input' has my data
dfTop.show()
Can someone explain the significance of drop()? 有人可以解释drop()的重要性吗? What if I do not use drop()?
如果我不使用drop()怎么办?
Thanks. 谢谢。
Why we need to use drop() at the end
为什么我们需要在末尾使用drop()
We don't. 我们没有。 We do it to remove temporary objects, which no longer carries useful information.
我们这样做是为了删除不再携带有用信息的临时对象。
What if I do not use drop()?
如果我不使用drop()怎么办?
You'll have one more column, full of ones, nothing more, nothing less. 您将再有一个专栏,里面满满的,仅此而已。
drop() is used to drop the column(s) which you don't want any more further, nothing much significance. drop()用于删除不需要的列,没有多大意义。
You can see it yourself simply by: 您可以通过以下方式自己查看:
//Commenting drop()
val dfTop = input.withColumn("rn", row_number.over(wSpec)).where($"rn"===1) //.drop("rn") //Note: 'input' has my data
dfTop.show()
dfTop.drop("rn").show()
//"rn" column is gone
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.