简体   繁体   English

如何将火花 dataframe 中的所有列值连接到 Python 中的字符串中?

[英]How to concat all column values in a spark dataframe into a String in Python?

I am trying to concat all the values in a column to make a string out of it with comma seperated values.我正在尝试将列中的所有值连接起来,以用逗号分隔值从中生成一个字符串。 To do that in Scala, I wrote the following code:为此,我在 Scala 中编写了以下代码:

val pushLogIds = incLogIdDf.select($"interface_log_id").collect().map(_.getInt(0).toString).mkString(",")

I am new to Python and after selecting the values in the column, I am unable to find a logic to Python to concat all the column values to a String after collecting them.我是 Python 的新手,在选择列中的值后,我无法找到 Python 的逻辑,以便在收集所有列值后将它们连接到字符串。

final_log_id_list = logidf.select("interface_log_id").collect()

Ex:前任:

interface_log_id
----------------
     1
     2
     3
     4

Output: a variable of String containing '1,2,3,4'

Could anyone let me know how to concat all the column values of a dataframe into a single String of comma separated values.谁能让我知道如何将 dataframe 的所有列值连接成一个逗号分隔值的字符串。

For converting a column to a single string, you can first collect the column as a list using collect_list and then concat with , , finally get the first value as a scalar using first :要将列转换为单个字符串,您可以首先使用collect_list将列收集为列表,然后与,连接,最后使用first将第一个值作为标量获取:

df.agg(F.concat_ws(",",F.collect_list(F.col("interface_log_id")))).first()[0]
#'1,2,3,4'

Another way is collect_list and then using python ','.join with map for numeric columns另一种方法是 collect_list ,然后使用 python ','.joinmap进行数字列

','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))

Adding benchmarks:添加基准:

%timeit ','.join(map(str,df.agg(F.collect_list(F.col("A"))).first()[0]))
#9.38 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.agg(F.concat_ws(",",F.collect_list(F.col("A")))).first()[0]
#9.46 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 中缺少的列值中的连接字符串 - Concat string in column values where it is missing in Python 如何将Spark Dataframe列的每个值作为字符串传递给python UDF? - How to pass each value of Spark Dataframe column as string to python UDF? Python DataFrame:删除/替换列中所有值的部分字符串 - Python DataFrame: Remove/Replace part of a string for all values in a column Python Spark Dataframe:将字符串列转换为时间戳 - Python Spark Dataframe: Conversion of string column into timestamp 如何删除 Pandas Dataframe 列中所有值的字符串的最后一个字符? - How to delete the last character of a string for all values in a Pandas Dataframe column? 如何拆分火花数据框列字符串? - How to split a spark dataframe column string? Python Dataframe:如何剥离列中列表中的所有值 - Python Dataframe : How to strip all values in a list in a column 如何在Python中为数据框的列的所有值计算滚动平均值 - how to calculate rolling mean for all values of a column of a dataframe in python 如何从 Pandas Python 中 DataFrame 中的列中的字符串中提取一些值? - How to extract some values from string in column in DataFrame in Pandas Python? 如何有条件地修改数据框列中的字符串值-Python / Pandas - How to conditionally modify string values in dataframe column - Python/Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM