[英]Scala/Spark: How to select columns to read ONLY when list of columns > 0
I'm passing in a parameter fieldsToLoad: List[String]
and I want to load ALL columns if this list is empty and load only the columns specified in the list if the list has more one or more columns. 我传递了一个参数
fieldsToLoad: List[String]
并且如果此列表为空,我想加载所有列,如果列表具有一个或多个列,则仅加载列表中指定的列。 I have this now which reads the columns passed in the list: 我现在有这个读取列表中传递的列:
val parquetDf = sparkSession.read.parquet(inputPath:_*).select(fieldsToLoad.head, fieldsToLoadList.tail:_*)
But how do I add a condition to load * (all columns) when the list is empty? 但是,当列表为空时,如何添加条件以加载*(所有列)?
You could use an if statement first to replace the empty with just *
: 您可以先使用if语句,用
*
替换空:
val cols = if (fieldsToLoadList.nonEmpty) fieldsToLoadList else Array("*")
sparkSession.read.parquet(inputPath:_*).select(cols.head, cols.tail:_*).
@Andy Hayden answer is correct but I want to introduce how to use selectExpr
function to simplify the selection @Andy Hayden的答案是正确的,但我想介绍如何使用
selectExpr
函数简化选择
scala> val df = Range(1, 4).toList.map(x => (x, x + 1, x + 2)).toDF("c1", "c2", "c3")
df: org.apache.spark.sql.DataFrame = [c1: int, c2: int ... 1 more field]
scala> df.show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
| 1| 2| 3|
| 2| 3| 4|
| 3| 4| 5|
+---+---+---+
scala> val fieldsToLoad = List("c2", "c3")
fieldsToLoad: List[String] = List(c2, c3) ^
scala> df.selectExpr((if (fieldsToLoad.nonEmpty) fieldsToLoad else List("*")):_*).show()
+---+---+
| c2| c3|
+---+---+
| 2| 3|
| 3| 4|
| 4| 5|
+---+---+
scala> val fieldsToLoad = List()
fieldsToLoad: List[Nothing] = List()
scala> df.selectExpr((if (fieldsToLoad.nonEmpty) fieldsToLoad else List("*")):_*).show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
| 1| 2| 3|
| 2| 3| 4|
| 3| 4| 5|
+---+---+---+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.