简体   繁体   中英

Removing newlines in a DataFrame field with udf function gives TypeTag Error

val trim: String => String = _.trim.replace("[\\r\\n]", "")

def main(args: Array[String]) {    
    val spark = ...    ...
    import spark.implicits._    
    val trimUDF = udf[String,String](trim)

    val df = spark.read.json(df_path)    ...    
    val fixed_dblogs_df = df.withColumn("qp_new", trimUDF('qp))    ... 
}

When I run this code I get a compile time error:

No TypeTag available for String

This error is where I define the udf function. I have no idea why this is happening. I have used udf functions before but this one is making this error. I used Spark 2.1.1 and that's it.

The purpose of the code is to remove all the new lines in one of my fields of columns that is StringType and I just want it to not have any newlines in it

Is there some reason you're using a UDF instead of the replace_regexp builtin?

val fixed_dblogs_df = df.withColumn("qp_new", replace_regexp('qp, "[\\r\\n]", "") ...)

UDF's break Spark's plan optimization.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM