简体   繁体   中英

pyspark dataframe data transformation with unique column values

I am trying to learn pysaprk with sql functionalities or by dataframe group by solution itself.

Thanks.

df1:

Name     Place     Product
AA       Germany   pencil
AA       Germany   pen
AA       Germany   pen
BB       Holland   hat
BB       Holland   hat
BB       Holland   pen
CC       USA       laptop
CC       USA       laptop
CC       USA       charger

Expected output:

Name     Place     Product
AA       Germany   pencil, pen
BB       Holland   hat, pen
CC       USA       laptop, charger

您可以使用 collect_set 作为

df.groupBy("Name","Place").agg(concat_ws(",",collect_set("Product")))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM