简体   繁体   中英

List of columns for orderBy in spark dataframe

I have a list of variables that contains column names. I am trying to use that to call orderBy on a dataframe.

val l = List("COL1", "COL2")
df.orderBy(l.mkString(","))

But mkstring combines the column names to be one string, leading to this error -

org.apache.spark.sql.AnalysisException: cannot resolve '`COL1,COL2`' given input columns: [COL1, COL2, COL3, COL4];

How can I convert this list of strings into different strings so it looks for "COL1", "COL2" instead of "COL1,COL2"? Thanks,

You can call orderBy for a specific column:

import org.apache.spark.sql.functions._
df.orderBy(asc("COL1")) // df.orderBy(asc(l.headOption.getOrElse("COL1")))
// OR
df.orderBy(desc("COL1"))

If you want sort by multiple columns you can write something like this:

val l = List($"COL1", $"COL2".desc)
df.sort(l: _*)

Passing single String argument is telling Spark to sort data frame using one column with given name. There is a method that accepts multiple column names and you can use it that way:

val l = List("COL1", "COL2")
df.orderBy(l.head, l.tail: _*)

If you care about the order use Column version of orderBy instead

val l = List($"COL1", $"COL2".desc)
df.orderBy(l: _*)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM