![](/img/trans.png)
[英]java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq
[英]How to sort spark DataFrame by Seq[org.apache.spark.sql.Column] in descending order?
有一个 DataFrame 如下:
import spark.implicits._
val df = List(
("id1","blue","1")
,("id2","red","2")
,("id3","red","3")
,("id4","blue","3")
,("id4","green","3")
).toDF("id", "color", "size")
+---+-----+----+
| id|color|size|
+---+-----+----+
|id1| blue| 1|
|id2| red| 2|
|id3| red| 3|
|id4| blue| 3|
|id4|green| 3|
+---+-----+----+
有一个 Seq[org.apache.spark.sql.Column],它可以按如下方式对df
进行排序:
import org.apache.spark.sql.Column
val col = Seq(new Column("size"), new Column("color"))
df.sort(col:_*).show
但我想按col
降序排序。
import org.apache.spark.sql.functions.desc
df.sort(desc(col:_*))
不起作用。
那么如何按col
降序对df
进行排序呢?
您可以使用col.map(_.desc)
构造具有desc
排序的排序表达式:
val col = Seq(new Column("size"), new Column("color"))
// ascending
df.sort(col: _*).show
+---+-----+----+
| id|color|size|
+---+-----+----+
|id1| blue| 1|
|id2| red| 2|
|id4| blue| 3|
|id4|green| 3|
|id3| red| 3|
+---+-----+----+
// descending
df.sort(col.map(_.desc): _*).show
+---+-----+----+
| id|color|size|
+---+-----+----+
|id3| red| 3|
|id4|green| 3|
|id4| blue| 3|
|id2| red| 2|
|id1| blue| 1|
+---+-----+----+
这里col.map(_.desc)
返回表达式列表:
col.map(_.desc)
// res2: Seq[org.apache.spark.sql.Column] =
// List(size DESC NULLS LAST, color DESC NULLS LAST)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.