简体   繁体   中英

Replace Empty values with nulls in Spark Dataframe

I have a data frame with n number of columns and I want to replace empty strings in all these columns with nulls.

I tried using

val ReadDf = rawDF.na.replace("columnA", Map( "" -> null));

and

val ReadDf = rawDF.withColumn("columnA", if($"columnA"=="") lit(null) else $"columnA" );

Both of them didn't work.

Any leads would be highly appreciated. Thanks.

Your first approach seams to fail due to a bug that prevents replace from being able to replace values with nulls, see here .

Your second approach fails because you're confusing driver-side Scala code for executor-side Dataframe instructions: your if-else expression would be evaluated once on the driver (and not per record); You'd want to replace it with a call to when function; Moreover, to compare a column's value you need to use the === operator, and not Scala's == which just compares the driver-side Column object:

import org.apache.spark.sql.functions._

rawDF.withColumn("columnA", when($"columnA" === "", lit(null)).otherwise($"columnA"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM