[英]Scala _* to select a list of dataframe columns
I have a dataframe and a list of columns like this:我有一个 dataframe 和这样的列列表:
import spark.implicits._
import org.apache.spark.sql.functions._
val df = spark.createDataFrame(Seq(("Java", "20000"), ("Python", "100000"))).toDF("language","users_count")
val data_columns = List("language","users_count").map(x=>col(s"$x"))
Why does this work:为什么这样做:
df.select(data_columns:_ *).show()
But not this?但不是这个?
df.select($"language", data_columns:_*).show()
Gives the error:给出错误:
error: no `: _*' annotation allowed here
(such annotations are only allowed in arguments to *-parameters)
And how do I get it to work so I can use _* to select all columns in a list, but I also want to specify some other columns in the select?我如何让它工作,以便我可以使用 _* 到 select 列表中的所有列,但我还想在 select 中指定其他一些列?
Thanks!谢谢!
Update:更新:
based on @chinayangyangyong answer below, this is how I solved it:基于下面@chinayangyangyong 的回答,我是这样解决的:
df.select( $"language" +: data_columns :_*)
It is because there is no method on Dataframe
with the signature select(col: Column, cols: Column*): DataFrame
, but there is one with the signature select(col: Column*): DataFrame
, which is why your first example works.这是因为
Dataframe
上没有带有签名select(col: Column, cols: Column*): DataFrame
的方法,但是有一个带有签名select(col: Column*): DataFrame
的方法,这就是您的第一个示例起作用的原因.
Interestingly, your second example would work if you were using String
to select the columns since there is a method select(col: String, cols: String*): DataFrame
.有趣的是,如果您将
String
用于 select 列,那么您的第二个示例将起作用,因为有一种方法select(col: String, cols: String*): DataFrame
。
df.select(data_columns.head, data_columns.tail:_*),show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.