简体   繁体   中英

Encrypt a CSV column via UDF, Spark - Scala

I am trying to encrypt a column in my CSV file. I am trying to do that using UDF. But I am getting compilation error. Here is my code:

import org.apache.spark.sql.functions.{col, udf}

val upperUDF1 = udf { str: String => Encryptor.aes(str) }

val rawDF = spark
      .read
      .format("csv")
      .option("header", "true")
      .load(inputPath)

rawDF.withColumn("id", upperUDF1("id")).show() //Compilation error.

I am getting the compilation error in the last line, am I using the incorrect syntax. Thanks in advance. 错误

You should send a Column not a String , you can reference to a column by different syntaxes:

$"<columnName>" 
col("<columnName>")

So you should try this:

rawDF.withColumn("id", upperUDF1($"id")).show()

or this:

rawDF.withColumn("id", upperUDF1(col("id"))).show()

Personally i like the dollar syntax the most, seems more elegant to me

In addition to the answer from SCouto, you could also register your udf as a Spark SQL function by

spark.udf.register("upperUDF2", upperUDF1)

Your subsequent select expression could then look like this

rawDF.selectExpr("id", "upperUDF2(id)").show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM