I am using pyspark and I have a dataframe df_001 which contain N columns 'rec' and 'id' and 'NAME'.
IF I want to add a new column 'unq_id' that will concatenate 'rec' and 'id' for example. When I do that it works perfectly:
df_f_final = df_001.withColumn('unq_id', sf.concat(sf.col('rec'), sf.lit('||'), sf.col('id'))) .
but I need to make the list of column to concatenate dynamique (list for example): How can I do that? for example create list: LL = ['rec', 'id', 'NAME'] or LL = ['rec', 'NAME'] and use that to generate the dataframe df_f_final and concatenate the columns that are in the list LL
It is easy i think but it s driving me crazy
Thank you for your help
check this out and let me know if it helps.
#InputDF
# +------+------+
# |rec_id| name|
# +------+------+
# | a1| ricky|
# | b1|sachin|
# +------+------+
LL = ['rec_id', 'name']
df1 = df.withColumn("unq_id_value", F.concat( *[F.concat(F.col(col),F.lit("||")) for col in LL]))
df2 = df1.withColumn("unq_id_value",F.expr("substring(unq_id_value, 1, length(unq_id_value)-2)"))
df2.show()
# +------+------+------------+
# |rec_id| name|unq_id_value|
# +------+------+------------+
# | a1| ricky| a1||ricky|
# | b1|sachin| b1||sachin|
# +------+------+------------+
Thank you Loka for your answer finally i found a solution, it s similar to yours. I did that and it s working
cols = ['col1', lit('||'), 'col2', lit('||'), 'col3']
unq_id = sf.udf(lambda cols: "".join([x for x in cols]), StringType())
df.withColumn('unqid', unq_id(sf.array(cols))).show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.