[英]How to concat all column values in a spark dataframe into a String in Python?
I am trying to concat all the values in a column to make a string out of it with comma seperated values.我正在尝试将列中的所有值连接起来,以用逗号分隔值从中生成一个字符串。 To do that in Scala, I wrote the following code:
为此,我在 Scala 中编写了以下代码:
val pushLogIds = incLogIdDf.select($"interface_log_id").collect().map(_.getInt(0).toString).mkString(",")
I am new to Python and after selecting the values in the column, I am unable to find a logic to Python to concat all the column values to a String after collecting them.我是 Python 的新手,在选择列中的值后,我无法找到 Python 的逻辑,以便在收集所有列值后将它们连接到字符串。
final_log_id_list = logidf.select("interface_log_id").collect()
Ex:前任:
interface_log_id
----------------
1
2
3
4
Output: a variable of String containing '1,2,3,4'
Could anyone let me know how to concat all the column values of a dataframe into a single String of comma separated values.谁能让我知道如何将 dataframe 的所有列值连接成一个逗号分隔值的字符串。
For converting a column to a single string, you can first collect the column as a list using collect_list
and then concat with ,
, finally get the first value as a scalar using first
:要将列转换为单个字符串,您可以首先使用
collect_list
将列收集为列表,然后与,
连接,最后使用first
将第一个值作为标量获取:
df.agg(F.concat_ws(",",F.collect_list(F.col("interface_log_id")))).first()[0]
#'1,2,3,4'
Another way is collect_list and then using python ','.join
with map
for numeric columns另一种方法是 collect_list ,然后使用 python
','.join
与map
进行数字列
','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))
Adding benchmarks:添加基准:
%timeit ','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))
#9.38 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.agg(F.concat_ws(",",F.collect_list(F.col("A")))).first()[0]
#9.46 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.