繁体   English   中英

将 spark dataframe 中的列聚合为 json

[英]aggregate columns in a spark dataframe as a json

我有以下火花 dataframe 并且我想将一列中的所有列聚合为 JSON 如下:如果输入 dataframe 是:

key,name,title
123,hsd,jds
148,sdf,qsz
589,qsz,aze

预期结果将是:

key,name,title,aggregation
123,hsd,jds,{"key":"123","name":"hsd", "title":"jds"}
148,sdf,qsz,{"key":"148","name":"sdf", "title":"qsz"}
589,qsz,aze,{"key":"589","name":"qsz", "title":"aze"}

解决方案不应对字段名称进行硬编码,请问您知道怎么做吗?

您可以使用to_json function

val df = Seq(
  (123, "hsd", "jds"),
  (148, "sdf", "qsz"),
  (589, "qsz", "aze")
).toDF("key", "name", "title")

import org.apache.spark.sql.functions._
df.withColumn("aggregation", to_json(struct($"key", $"name", $"title")))
  .show(false)

如果您有很多列,您可以在下面使用它。

df.withColumn("aggregation", to_json(struct(df.columns.map(col): _*))) 

Output:

+---+----+-----+--------------------------------------+
|key|name|title|aggregation                           |
+---+----+-----+--------------------------------------+
|123|hsd |jds  |{"key":123,"name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":148,"name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":589,"name":"qsz","title":"aze"}|
+---+----+-----+--------------------------------------+

使用to_json但具有更灵活的列:

Seq(
    (123, "hsd", "jds"),
    (148, "sdf", "qsz"),
    (589, "qsz", "aze")
).toDF("key", "name", "title")
dfA.withColumn("aggregation", to_json(
  map(dfA.columns.flatMap(columnName => Seq(lit(columnName), col(columnName))):_*))
).show(truncate = false)


+---+----+-----+----------------------------------------+
|key|name|title| aggregation                              |
+---+----+-----+----------------------------------------+
|123|hsd |jds  |{"key":"123","name":"hsd","title":"jds"}|
|148|sdf |qsz  |{"key":"148","name":"sdf","title":"qsz"}|
|589|qsz |aze  |{"key":"589","name":"qsz","title":"aze"}|
+---+----+-----+----------------------------------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM