[英]Scala spark aggregate a set of columns in a data frame to a JSON string
Given a data frame,给定一个数据框,
+-----------------------------+
| id| name| payable| strategy|
+-----------------------------+
| 0| Joe| 100| st-1|
| 1| Tom| 200| st-2|
| 2| John| 300| st-1|
+-----------------------------+
What would be the most efficient way to convert each row to a JSON string such as follows,将每一行转换为 JSON 字符串的最有效方法是什么,如下所示,
{
"payload": {
"name": "Joe",
"payments": [
{
"strategy": "st-1",
"payable": 100
}
]
}
}
Currently I have UDF to manually stringify the provided columns, but I'm wondering whether there is a better way to achieve this.目前我有 UDF 来手动对提供的列进行字符串化,但我想知道是否有更好的方法来实现这一点。 The to_json method is the best alternative I found so far but that takes only one column as an input. to_json方法是迄今为止我发现的最好的替代方法,但它只需要一列作为输入。
Using to_json()
is the correct approach, but the contents need to be passed as struct
as appropriate:使用to_json()
是正确的方法,但需要根据需要将内容作为struct
传递:
val df = Seq((0,"Joe",100,"st-1"), (1,"Tom",200,"st-2")).toDF("id","name","payable","strategy")
val result = df.select(
to_json(struct(
struct($"name",
array(struct($"strategy",$"payable")) as "payments"
) as "payload")
) as "jsonValue"
)
result.show(false)
+-------------------------------------------------------------------------+
|jsonValue |
+-------------------------------------------------------------------------+
|{"payload":{"name":"Joe","payments":[{"strategy":"st-1","payable":100}]}}|
|{"payload":{"name":"Tom","payments":[{"strategy":"st-2","payable":200}]}}|
+-------------------------------------------------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.