繁体   English   中英

如何按 Seq[org.apache.spark.sql.Column] 按降序对 spark DataFrame 进行排序?

[英]How to sort spark DataFrame by Seq[org.apache.spark.sql.Column] in descending order?

有一个 DataFrame 如下:

import spark.implicits._
val df = List(
  ("id1","blue","1")
  ,("id2","red","2")
  ,("id3","red","3")
  ,("id4","blue","3")
  ,("id4","green","3")
).toDF("id", "color", "size")

+---+-----+----+
| id|color|size|
+---+-----+----+
|id1| blue|   1|
|id2|  red|   2|
|id3|  red|   3|
|id4| blue|   3|
|id4|green|   3|
+---+-----+----+

有一个 Seq[org.apache.spark.sql.Column],它可以按如下方式对df进行排序:

import org.apache.spark.sql.Column
val col = Seq(new Column("size"), new Column("color"))
df.sort(col:_*).show

但我想按col降序排序。

import org.apache.spark.sql.functions.desc

df.sort(desc(col:_*))不起作用。

那么如何按col降序对df进行排序呢?

您可以使用col.map(_.desc)构造具有desc排序的排序表达式:

val col = Seq(new Column("size"), new Column("color"))

// ascending
df.sort(col: _*).show
+---+-----+----+
| id|color|size|
+---+-----+----+
|id1| blue|   1|
|id2|  red|   2|
|id4| blue|   3|
|id4|green|   3|
|id3|  red|   3|
+---+-----+----+

// descending
df.sort(col.map(_.desc): _*).show
+---+-----+----+
| id|color|size|
+---+-----+----+
|id3|  red|   3|
|id4|green|   3|
|id4| blue|   3|
|id2|  red|   2|
|id1| blue|   1|
+---+-----+----+

这里col.map(_.desc)返回表达式列表:

col.map(_.desc)
// res2: Seq[org.apache.spark.sql.Column] = 
//       List(size DESC NULLS LAST, color DESC NULLS LAST)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM