简体   繁体   中英

Scala/Spark: How to select columns to read ONLY when list of columns > 0

I'm passing in a parameter fieldsToLoad: List[String] and I want to load ALL columns if this list is empty and load only the columns specified in the list if the list has more one or more columns. I have this now which reads the columns passed in the list:

    val parquetDf = sparkSession.read.parquet(inputPath:_*).select(fieldsToLoad.head, fieldsToLoadList.tail:_*)

But how do I add a condition to load * (all columns) when the list is empty?

You could use an if statement first to replace the empty with just * :

val cols = if (fieldsToLoadList.nonEmpty) fieldsToLoadList else Array("*")
sparkSession.read.parquet(inputPath:_*).select(cols.head, cols.tail:_*).

@Andy Hayden answer is correct but I want to introduce how to use selectExpr function to simplify the selection

scala> val df = Range(1, 4).toList.map(x => (x, x + 1, x + 2)).toDF("c1", "c2", "c3")
df: org.apache.spark.sql.DataFrame = [c1: int, c2: int ... 1 more field]

scala> df.show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
|  3|  4|  5|
+---+---+---+


scala> val fieldsToLoad = List("c2", "c3")
fieldsToLoad: List[String] = List(c2, c3)                                                  ^

scala> df.selectExpr((if (fieldsToLoad.nonEmpty) fieldsToLoad else List("*")):_*).show()
+---+---+
| c2| c3|
+---+---+
|  2|  3|
|  3|  4|
|  4|  5|
+---+---+


scala> val fieldsToLoad = List()
fieldsToLoad: List[Nothing] = List()

scala> df.selectExpr((if (fieldsToLoad.nonEmpty) fieldsToLoad else List("*")):_*).show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
|  1|  2|  3|
|  2|  3|  4|
|  3|  4|  5|
+---+---+---+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM