how to merge the multiple columns in single columns using UDF and remove the 0 value row from the column in pyspark

Question

df_Description3 = df_Description1.fillna(value="0",subset=["DES","INV","MKT","SHO"])
lst_Cols= ["DES","INV","MKT","SHO"]
def Merge(c1,c2,c3,c4):
    if "DES"!="0":
        return c1
    elif "INV"!='0':
        return c2
    elif "MKT"!="0":
        return c3
    elif "SHO"!="0":
        return c4
    return c1,c2,c3,c4

myudf = F.udf(Merge,StringType())

df_Description3 = df_Description3.withColumn("Descriptions",myudf("DES","INV","MKT","SHO")).show()

Answer 1

Using UDFs is not recommended as it can impact performance (see this ). So here is a solution using spark functions. Note that concat_ws() will handle nulls during the merging of columns so you don't have to do the extra step of filling with 0 and then removing it. At the end if all 4 columns are null then you can drop the row by checking if it is empty.

from pyspark.sql import functions as F

df_Description3 = df_Description3.withColumn("Descriptions", F.concat_ws(" ", "DES", "INV", "MKT", "SHO"))
df_Description3 = df_Description3.filter(F.col("Descriptions") != "")
df_Description3.show()

how to merge the multiple columns in single columns using UDF and remove the 0 value row from the column in pyspark

Question

1 answers

solution1
0 2022-07-21 10:32:35

how to merge the multiple columns in single columns using UDF and remove the 0 value row from the column in pyspark

Question

1 answers

solution1 0 2022-07-21 10:32:35

solution1
0 2022-07-21 10:32:35