![](/img/trans.png)
[英]Add a new Column in Spark DataFrame which contains the sum of all values of one column-Scala/Spark
[英]Scala Spark Dataframe sum list of json values in the column
我有一个火花数据框,如下所示:
ID | col1 | col2 |
---|---|---|
1 | [{"a":1}] | [{“d”:3,“e”:4}] |
2 | [{“a2}] | [{“d”:5,“e”:10}] |
我想获得以下数据框:
ID | col2_sum |
---|---|
1 | 7 |
2 | 10 |
数据类型:
id:StringType
col1:StringType
col2:StringType
提前致谢
使用from_json
将 JSON 字符串转换为 map 类型,然后使用aggregate
function 对 Z1D78DC8ED51214E5AEZB8 值求和:
val df = Seq(
(1, """[{"a":1}]""", """[{"d": 3, "e": 4}]"""),
(2, """[{"a":2}]""", """[{"d": 5, "e": 10}]""")
).toDF("id", "col1", "col2")
val df1 = (df
.withColumn("col2", from_json(col("col2"), lit("array<map<string,int>>")))
.withColumn("col2", flatten(expr("transform(col2, x -> map_values(x))")))
.withColumn("col2_sum", expr("aggregate(col2, 0, (acc, x) -> acc + x)"))
.drop("col1", "col2")
)
df1.show
//+---+--------+
//| id|col2_sum|
//+---+--------+
//| 1| 7|
//| 2| 15|
//+---+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.