简体   繁体   中英

How to concat all column values in a spark dataframe into a String in Python?

I am trying to concat all the values in a column to make a string out of it with comma seperated values. To do that in Scala, I wrote the following code:

val pushLogIds = incLogIdDf.select($"interface_log_id").collect().map(_.getInt(0).toString).mkString(",")

I am new to Python and after selecting the values in the column, I am unable to find a logic to Python to concat all the column values to a String after collecting them.

final_log_id_list = logidf.select("interface_log_id").collect()

Ex:

interface_log_id
----------------
     1
     2
     3
     4

Output: a variable of String containing '1,2,3,4'

Could anyone let me know how to concat all the column values of a dataframe into a single String of comma separated values.

For converting a column to a single string, you can first collect the column as a list using collect_list and then concat with , , finally get the first value as a scalar using first :

df.agg(F.concat_ws(",",F.collect_list(F.col("interface_log_id")))).first()[0]
#'1,2,3,4'

Another way is collect_list and then using python ','.join with map for numeric columns

','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))

Adding benchmarks:

%timeit ','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))
#9.38 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.agg(F.concat_ws(",",F.collect_list(F.col("A")))).first()[0]
#9.46 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM