spark dataframe 中 orderBy 的列列表

Question

I have a list of variables that contains column names.我有一个包含列名的变量列表。 I am trying to use that to call orderBy on a dataframe.我正在尝试使用它在 dataframe 上调用 orderBy。

val l = List("COL1", "COL2")
df.orderBy(l.mkString(","))

But mkstring combines the column names to be one string, leading to this error -但是mkstring将列名组合为一个字符串，导致此错误 -

org.apache.spark.sql.AnalysisException: cannot resolve '`COL1,COL2`' given input columns: [COL1, COL2, COL3, COL4];

How can I convert this list of strings into different strings so it looks for "COL1", "COL2" instead of "COL1,COL2"?如何将此字符串列表转换为不同的字符串，以便查找“COL1”、“COL2”而不是“COL1、COL2”？ Thanks,谢谢，

Answer 1

You can call orderBy for a specific column:您可以为特定列调用 orderBy：

import org.apache.spark.sql.functions._
df.orderBy(asc("COL1")) // df.orderBy(asc(l.headOption.getOrElse("COL1")))
// OR
df.orderBy(desc("COL1"))

If you want sort by multiple columns you can write something like this:如果要按多列排序，可以编写如下内容：

val l = List($"COL1", $"COL2".desc)
df.sort(l: _*)

Answer 2

Passing single String argument is telling Spark to sort data frame using one column with given name.传递单个String参数是告诉 Spark 使用具有给定名称的一列对数据框进行排序。 There is a method that accepts multiple column names and you can use it that way:有一种方法可以接受多个列名，您可以这样使用它：

val l = List("COL1", "COL2")
df.orderBy(l.head, l.tail: _*)

If you care about the order use Column version of orderBy instead如果您关心订单，请改用Column版本的orderBy

val l = List($"COL1", $"COL2".desc)
df.orderBy(l: _*)

spark dataframe 中 orderBy 的列列表

问题描述

2 个解决方案

解决方案1
1 2020-04-10 20:00:56

解决方案2
0 已采纳 2020-04-10 20:39:41

spark dataframe 中 orderBy 的列列表

问题描述

2 个解决方案

解决方案1 1 2020-04-10 20:00:56

解决方案2 0 已采纳 2020-04-10 20:39:41

解决方案1
1 2020-04-10 20:00:56

解决方案2
0 已采纳 2020-04-10 20:39:41