简体   繁体   中英

Concatenate two columns of spark dataframe with null values

I have two columns in my spark dataframe

First_name  Last_name
Shiva       Kumar
Karthik     kumar
Shiva       Null
Null        Shiva

My requirement is to add a new column to dataframe by concatenating the above 2 columns with a comma and handle null values too.

I have tried using concat and coalesce but I can't get the output with comma delimiter only when both columns are available

Expected output

Full_name
Shiva,kumar
Karthik,kumar
Shiva
Shiva

concat_ws为您连接并处理null值。

df.withColumn('Full_Name', F.concat_ws(',', F.col('First_name'), F.col('Last_name'))

You can use lit :

import pyspark.sql.functions as F

f = df.withColumn('Full_Name', F.concat(F.col('First_name'), F.lit(','), F.col('Last_name'))).select('Full_Name')

# fix null values
f = f.withColumn('Full_Name', F.regexp_replace(F.col('Full_Name'), '(,Null)|(Null,)', ''))

f.show()

+-------------+
|    Full_Name|
+-------------+
|  Shiva,Kumar|
|Karthik,kumar|
|        Shiva|
|        Shiva|
+-------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM