繁体   English   中英

将Spark数据框列的不同值转换为列表

[英]Converting distinct values of a Spark dataframe column into a list

我有一个看起来像这样的数据集:

+-------+-----+----------+--------------+
| Name  | Age | Pet Name | Phone Number |
+-------+-----+----------+--------------+
| Brett |  14 | Rover    | 123 456 7889 |
| Amy   |  15 | Ginger   | 123 456 8888 |
| Amy   |  15 | Polly    | 123 456 8888 |
| Josh  |  14 | Fido     | 312 456 9999 |
+-------+-----+----------+--------------+

我需要使用Spark以以下格式呈现它:

+-------+-----+---------------+--------------+
| Name  | Age |   Pet Name    | Phone Number |
+-------+-----+---------------+--------------+
| Brett |  14 | Rover         | 123 456 7889 |
| Amy   |  15 | Ginger, Polly | 123 456 8888 |
| Josh  |  14 | Fido          | 312 456 9999 |
+-------+-----+---------------+--------------+

有人可以帮我解决这个问题的最好方法吗?

您还可以使用groupBy Name和Age并收集为Pet Name的列表,如下所示

df.groupBy("Name", "Age")
  .agg(collect_list($"Pet Name").as("PetName"), first("Phone Number").as("PhoneNumber")) 

或者你也可以

data.groupBy("Name", "Age", "Phone Number")
  .agg(collect_list($"Pet Name").as("PetName"))

输出:

+-----+---+---------------+------------+
|Name |Age|PetName        |PhoneNumber |
+-----+---+---------------+------------+
|Amy  |15 |[Ginger, Polly]|123 456 8888|
|Brett|14 |[Rover]        |123 456 7889|
|Josh |14 |[Fido]         |312 456 9999|
+-----+---+---------------+------------+

如果需要字符串,可以使用concat_ws作为

data.groupBy("Name", "Age", "Phone Number")
  .agg(concat_ws(",",collect_list($"Pet Name")).as("PetName"))

输出:

+-----+---+------------+------------+
|Name |Age|Phone Number|PetName     |
+-----+---+------------+------------+
|Brett|14 |123 456 7889|Rover       |
|Amy  |15 |123 456 8888|Ginger,Polly|
|Josh |14 |312 456 9999|Fido        |
+-----+---+------------+------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM