简体   繁体   中英

Scala _* to select a list of dataframe columns

I have a dataframe and a list of columns like this:

import spark.implicits._
import org.apache.spark.sql.functions._

val df = spark.createDataFrame(Seq(("Java", "20000"), ("Python", "100000"))).toDF("language","users_count") 
val data_columns = List("language","users_count").map(x=>col(s"$x")) 

Why does this work:

 df.select(data_columns:_ *).show()

But not this?

 df.select($"language", data_columns:_*).show()

Gives the error:

 error: no `: _*' annotation allowed here
    (such annotations are only allowed in arguments to *-parameters) 

And how do I get it to work so I can use _* to select all columns in a list, but I also want to specify some other columns in the select?



based on @chinayangyangyong answer below, this is how I solved it:

df.select( $"language" +: data_columns :_*)

It is because there is no method on Dataframe with the signature select(col: Column, cols: Column*): DataFrame , but there is one with the signature select(col: Column*): DataFrame , which is why your first example works.

Interestingly, your second example would work if you were using String to select the columns since there is a method select(col: String, cols: String*): DataFrame .

df.select(data_columns.head, data_columns.tail:_*),show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM