[英]how to concat values of columns in pyspark
I have a dataframe. 我有一个数据框。 A column "names" contains columns headers, which values should be concated.
列“名称”包含列标题,应将其值隐藏。 I want to do it with pyspark concat_ws() but nothing works.
我想用pyspark concat_ws()来做,但是没有用。 I must solve it with concat_ws() function, no pandas etc.
我必须使用concat_ws()函数来解决它,没有熊猫等。
The best what I got it was concated headers, but not values in this columns. 我得到的最好的是缩略标题,而不是此列中的值。 I couldn't return list from function to unpack it in concat_ws()
我无法从函数返回列表以在concat_ws()中解压缩它
map_cols = {'a':'newA', 'b':'newB', 'c':'newC', 'd':'newD'}
@udf
def get_result(names_col):
headers = []
for i in names_col:
headers.append(map_cols[i])
return headers
df = df.withColumn('names_arr', split('names', '_')).withColumn('result', concat_ws(';', get_result(col('names_arr'))))
Input dataframe: names | newA|newB|newC|newD --------------------------- a_b |1 | 2 | 7 |8 --------------------------- a_b_c |2 | 3 | 4 |4 --------------------------- a_b_c_d |3 | 2 |4 |4 --------------------------- c_d | 89 | 5 |3 |5 --------------------------- b_c_d | 7 |5 |6 | 5 Expected output dataframe names | newA|newB|newC|newD|result -------------------------------------- a_b |1 | 2 | 7 | 8 |1;2 -------------------------------------- a_b_c |2 | 3 | 4 |4 |2;3;4 -------------------------------------- a_b_c_d |3 | 2 |4 |4 |2;3;4;4 -------------------------------------- c_d |89 | 5 |3 |5 |3;5 -------------------------------------- b_c_d |7 |5 |6 | 5 |5;6;5
I am assuming that in your expected output colA
is a typo for the last two rows (89 and 7) 我假设在您的预期输出中,
colA
是最后两行(89和7)的错字
You can iterate the dataframe.columns
and perform concat_ws
您可以遍历
dataframe.columns
和执行concat_ws
# Skip data prepare
#import
import pyspark.sql.functions as f
df.show()
+-------+----+----+----+----+
| names|newA|newB|newC|newD|
+-------+----+----+----+----+
| a_b| 1| 2|null|null|
| a_b_c| 2| 3| 4|null|
|a_b_c_d| 3| 2| 4| 4|
| c_d|null|null| 3| 5|
| b_c_d|null| 5| 6| 5|
+-------+----+----+----+----+
Filetring column name if its names
and the concat by ;
Filetring列名称(如果其
names
与连接符通过;
separator 分隔器
df.withColumn('result', f.concat_ws(';', *[c for c in df.columns if c!='names'])).show()
+-------+----+----+----+----+-------+
| names|newA|newB|newC|newD| result|
+-------+----+----+----+----+-------+
| a_b| 1| 2|null|null| 1;2|
| a_b_c| 2| 3| 4|null| 2;3;4|
|a_b_c_d| 3| 2| 4| 4|3;2;4;4|
| c_d|null|null| 3| 5| 3;5|
| b_c_d|null| 5| 6| 5| 5;6;5|
+-------+----+----+----+----+-------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.