简体   繁体   中英

udf spark column names

I need to specify a sequence of columns. If I pass two strings, it works fine

val cols = array("predicted1", "predicted2")

but if I pass a sequence or an array, I get an error:

 val cols = array(Seq("predicted1", "predicted2"))

Could you please help me? Many thanks!

You have at least two options here:

  1. Using a Seq[String] :

     val columns: Seq[String] = Seq("predicted1", "predicted2") array(columns.head, columns.tail: _*) 
  2. Using a Seq[ColumnName] :

     val columns: Seq[ColumnName] = Seq($"predicted1", $"predicted2") array(columns: _*) 

Function signature is def array(colName: String, colNames: String*): Column which means that it takes one string and then one or more strings. If you want to use a sequence, do it like this:

array("predicted1", Seq("predicted2"):_*)

From what I can see in the code , there are a couple of overloaded versions of this function, but neither one takes a Seq directly. So converting it into varargs as described should be the way to go.

You can use Spark's array form def array(cols: Column*): Column where the cols val is defined without using the $ column name notation -- ie when you want to have a Seq[ColumnName] type specifically, but created using strings. Here is how to solve that...

import org.apache.spark.sql.ColumnName
import sqlContext.implicits._
import org.apache.spark.sql.functions._

val some_states: Seq[String] = Seq("state_AK","state_AL","state_AR","state_AZ")
val some_state_cols: Seq[ColumnName] = some_states.map(s => symbolToColumn(scala.Symbol(s)))

val some_array = array(some_state_cols: _*)

...using Spark's symbolToColumn method.

or with the ColumnName(s) constructor directly.

val some_array: Seq[ColumnName] = some_states.map(s => new ColumnName(s))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM