[英]Convert JSON Data in Spark DataFrame column into tabular format
[英]How to merge all unique values of a spark dataframe column into single row based on id and convert the column into json format
如何根据 id 将 spark dataframe 列的所有唯一值合并为单行并将列转换为 json 格式。
输入示例:
+---+------+-----------+
|id |gender|banner_desc|
+---+------+-----------+
|123|male |banner1 |
|123|male |banner2 |
|123|male |banner3 |
|124|female|banner1 |
|124|female|banner2 |
|125|male |banner1 |
|126|female|banner3 |
+---+------+-----------+
Output 示例:
+---+------+-------------------------------------------------------------+
|id |gender|banner_desc |
+---+------+-------------------------------------------------------------+
|123|male |[{"name":"banner1"}, {"name":"banner2"}, {"name":"banner3"}] |
|124|female|[{"name":"banner1"}, {"name":"banner2"}] |
|125|male |[{"name":"banner1"}] |
|126|female|[{"name":"banner3"}] |
+---+------+-------------------------------------------------------------+
您可以使用to_json
从collect_list(struct())
获取 JSON 字符串:
val result = df.groupBy(
"id","gender"
).agg(
to_json(
collect_list(
struct(col("banner_desc").as("name"))
)
).as("banner_desc")
)
result.show(false)
+---+------+----------------------------------------------------------+
|id |gender|banner_desc |
+---+------+----------------------------------------------------------+
|124|female|[{"name":"banner1"},{"name":"banner2"}] |
|126|female|[{"name":"banner3"}] |
|125|male |[{"name":"banner1"}] |
|123|male |[{"name":"banner1"},{"name":"banner2"},{"name":"banner3"}]|
+---+------+----------------------------------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.