简体   繁体   中英

Concat all columns in a dataframe

I am coding Python in Databricks and I am using spark 2.4.5.

I need to have a UDF with two parameters. The first one is a Dataframe and the second one is SKid, in that Dataframe then I need to hash all columns on that dataframe.

I have written the below code but I need to know how can I concat all columns in a dynamic dataframe?

def xHashDataframe(df,skColumn):
  a = df.select(
      col(skColumn)
      ,md5(
      concat(
        col("column1"), lit("~"), 
        col("column2"), lit("~"),
        ...
        col("columnN"), lit("~")
      )).alias("RowHash")
    )
  return a
  

There is no need to use a UDF. concat_ws should do the trick:

df.withColumn("RowHash", F.md5(F.concat_ws("~", *df.columns))).show(truncate=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM