Code:
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Column
def func(rawDF: DataFrame,primaryKey: Column, orderKey: Column): DataFrame = {
//some process
return newDf
}
I am trying to create a new processed DF from existing raw DF with the function above.
Code:
var processedDF = func(rawDF,"col1","col2")
Error:
<console>:73: error: type mismatch;
found : String("col1")
required: org.apache.spark.sql.Column
var processedDF = func(rawDF,"col1","col2")
^
Any suggestions on how to change the type of the function parameter from String to org.apache.spark.sql.Column
Either
import org.apache.spark.sql.functions.col
func(rawDF, col("col1"), col("col2"))
or
func(rawDF, rawDF("col1"), rawDF("col2"))
or provide Column
directly through $
(where spark
is SparkSession
object)
import spark.implicits.StringToColumn
func(rawDF, $"col1", $"col2")
or Symbol
import spark.implicits.symbolToColumn
func(rawDF, 'col1, 'col2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.